An Ounce of Prevention

Have You Had a Data Center Health Checkup Recently?

AFCOM Communique

By Keith Meierhofer

To ensure your optimal personal health, it makes sense to visit your doctor on a regular basis. The same wisdom applies to data center management. You need to regularly test the multiple factors impacting overall data center performance and diagnose and remedy any problem areas before they become a major issue.

To allay health concerns, data center managers need to look across multiple disciplines affecting their data center performance:

For example, one data center had a data center health checkup performed on their facility to better understand how to plan for growth. They found that they had space to add IT equipment, but had already exceeded redundancy capacities of their power system by 20% and had limited circuit capacities as well as airflow issues. So before adding additional IT equipment they expanded their electrical system, removed perforated tiles and planned for additional cooling. This enabled them to maintain a balance among key data center elements. An ounce of prevention saved them money as well as sizable headaches down the road.

Another company's data center space initially appeared to be filled 79%, but a health checkup revealed they were actually at only 61% capacity. They now perform a checkup every four months to ensure data center reliability and plan building system upgrades before critical need comes into play.

What are the vital signs to test during a data center checkup? When data centers begin to run out of space, power, or cooling, they ask "why, when, and how" to determine if there is justification to expand or build a new data center. These same questions should be asked and key statistics should be bench marked at regular intervals. Statistical trending provides data center managers and business owners with the advance warning needed in today's business environment. Health checkups should also include comprehensive questions such as:

Pressures on data center managers are increasing. Servers are using more data center space; business demands on IT are providing additional applications for more uses; more clustering increases hardware reliability, but also adds to space, power and cooling usage. So managers need to have more point-in-time health information to determine how to optimize their resources. It makes sense to graph key stats over time, comparing which areas need the most immediate attention and funding.

Just as an MD would evaluate a first-time patient, it is important to take measures to determine clients' data center needs. For example:

  1. How many cabinets can your data center support? How many can be added? What is the actual rack unit (RU) used by IT equipment?
  2. What type of redundancies do you have in your power and cooling systems?
  3. What are the capacities of these systems: UPS, power distribution, and cooling units? Is this being trended over time?
  4. What type of alarm and fire protection systems does your data center have?
  5. How many levels of security have been implemented within and around the data center?
  6. What is your future data center capacity, and where would you place additional IT equipment and building systems if needed?
  7. What are your circuit temperatures?
  8. How much air pressure do you have at perforated tiles?
  9. What are the temperatures around the room at the intake of IT equipment?
  10. Cooling capacities are often robbed by growth and surpass the recommended N+1 redundancy equation. What is the ratio of your redundancy element to your building systems capacity? If one system failed, would another take up a full 100% of the load? Where are these systems positioned? Can they truly provide backup cooling?

One company with a sizable UPS capacity recently decided to enhance its power distribution. They were about to cut a purchase order for $1.2 million of new IT hardware when their data center manager wisely asked if their facility could handle such an increase. The answer was a definitive "no." The power distribution within their data center was almost at capacity, cooling was already at capacity and additional space was needed. New circuit breakers, at the least, were required to handle the new load. Anticipating such needs before buying expensive hardware was a step that could have been easily overlooked. A dose of preventive medicine made all the difference.

A burgeoning industry trend to take into consideration when assessing the health of a data center is temperature. Fevers are a concern as heat loads rise and equipment is more concentrated. Point loads are increasing from 30 to 80, 100, or 125 and more watts per square foot without N+1 redundancy or the necessary cooling reliability. This is just one scenario in which having your vital signs regularly checked will notify when you are nearing capacity and, therefore, may help prevent a coronary outage.

Physical security goes hand in hand with physical health. A data center checkup needs to include an examination of the security of your site, building, data center and support spaces. When building issues arise, data center managers are responsible for identifying the cause and core of the problem and annotating how progress is being made to resolve them.

Operational procedures are another key factor. Do all the people going in and out of your data center know what to do if an alarm sounds? Reliability documentation and clear-cut, step-by-step instructions for employees must be in place before a crisis occurs. After a fire alarm went off recently at a regional hospital, a data center employee pushed what he thought was a reset button. Instead he hit the emergency power off switch which resulted in a data center failure. It took six hours to diagnose the problem and another eight hours to recover. Proper training would have prevented this costly accident.

Data Center reliability is dependent on the state of health of not only its facilities and equipment but also its people and processes. All components work together in a synergistic blend to meet efficiency requirements and business demands. Acquiring and maintaining a vibrant state of data center health is definitely easier to achieve with an ounce or two of preventive medicine.

Keith Meierhofer is a founding partner of N'compass, a Minneapolis-based regional consulting firm specializing in telecommunications and technology systems design, maintenance solution services, and optimization. Meierhofer can be reached at kmeierhofer@ncompass-inc.com.