A study by the Uptime Institute revealed that about 70% of issues plaguing data centers today are the result of human errors. So, it’s not just the systems and hardware that needs to be monitored, but the ones manning them, too.
Anything from an unintentional temperature adjustment from Fahrenheit to Celsius, an accidental pull of power cord, or an accidental plugging of a server causing circuit overload, can shut down a data center. And the result is huge loss of money and damage to reputation for the business. While experts may blame ‘lack of training’ to be the underlying reason for such mistakes, it should be noted that even well-trained people can make mistakes when in a rush, are tired, or are not thinking through the situation.
Here are some simple steps that would help avoid such human errors from causing a major data center outage:
- Proper training and planning: Proper ongoing training should be enforced for all individuals with access to data center - security, IT, emergency and facility personnel. All of them should have basic knowledge of equipment, systems, and operations and should follow a documented method of procedure (MOP).
- Shield Emergency OFF buttons: Emergency OFF buttons near data center doorways should be properly covered to prevent unintentional shutting down of power.
- Labeling the components correctly: To ensure correct power system operation sequence, all switching devices and the facility one-line diagram must be labeled properly.
- Secure Access policies: Maintain data center sign-in policy for security purposes wherein visitors are always tracked and are accompanied by escorts if needed.
- Ensure food policies: Enforce rules to keep food, drink and other contaminants out of the data center. There is a high risk of shorting out crucial computer components through liquids and other food materials.
While a 100% uptime is nearly impossible, following these best practices will greatly help reduce the data center downtime due to human errors.