Data center resilience is about evaluating risks and adapting practices
A data center’s job is to store, manage and provide access to the mission-critical data that empowers decision-making, drives operations and supports strategic initiatives.
This data must be protected, whether that protection is mandated or not. Otherwise, consequences like operational failures, significant financial loss or worse could result.
When it comes to protecting data, information security often comes to mind: encryption, firewalls, access controls. But data protection also relies on uptime. Without it, downtime can lead to data loss or corruption, which negatively impacts operations and decision-making.
To address the vulnerabilities that cause downtime, you must first understand the risks inherent to your data center—and then take steps to mitigate them.
Determining your data center risk profile
Every data center’s risk profile is different, made up of a complex web of internal and external factors like:
- Likelihood of natural disasters (floods, hurricanes, wildfires, earthquakes, tornadoes, ice storms, etc.)
- Possibility of equipment failure (cooling equipment breakdowns, server outages, accidental disconnections, etc.)
- Possibility of power interruptions (grid failure, voltage fluctuations, etc.)
- Location and access (likelihood of unauthorized entry, placement of infrastructure within a facility, etc.)
Once you understand the true risks, then you can assess the probability of occurrence and evaluate the potential impact on operations. For a hospital, how would downtime affect patient care and safety in an emergency? For a manufacturer, what is the cost of delaying order fulfillment or customer transactions?
Here are a few examples of what you should consider when assessing your risk profile to improve data center resilience.
To determine the likelihood of equipment failure, you could consider factors like:
- Staff training: Are workers taking steps to proactively minimize human error and prevent mistakes during routine tasks?
- System configurations: Does the space support proper airflow to avoid overheating?
- Equipment age: Is equipment reaching end-of-life?
- Maintenance: Is regular maintenance being performed to reduce vulnerabilities?
To determine the likelihood of unauthorized entry, you could consider factors like:
- Third-party providers: How many vendors and service providers have access to the site? Are they trustworthy?
- Physical location: Is the data center in a shared space? Is it easy for people to find? (In a hospital, could wandering patients find it? In a school, could a student discover it?)
- Access control: What types of safeguards are in place to control access? Are visitor management systems in use?
- Staff training: Are workers watching for and able to recognize suspicious behavior? Do they follow access protocols?
Drawing from the answers you uncover, you can prioritize improvements based on urgency and potential impact to improve data center resilience.
Understanding requirements for operation during disasters and other events
Because they’re essential to public safety, the International Building Code (IBC) requires certain types of facilities, including many data centers, to continue operations during natural disasters and other events. This is something to consider as you evaluate data center resilience.
To recognize levels of risk, IBC categorizes facilities into four groups:
- Category I includes buildings where failure poses minimal risk to safety: agricultural facilities and storage buildings
- Category II includes buildings not covered in other categories: most commercial and residential structures
- Category III includes buildings where failure poses substantial risk to safety: lecture halls, theaters, power-generation stations, prisons, water treatment plants, etc.
- Category IV includes essential facilities that must remain operational in a disaster: aviation control towers, chemical plants, data centers, fire/police stations, hospitals, etc.
To comply, Category IV facilities must meet strict design and construction standards that ensure resilience. And this includes their data center infrastructure, too: racks, cabinets, etc.
A cabinet or rack’s seismic rating indicates how well it can protect active equipment and reduce damage (thus reducing data loss) during heavy vibration. The higher the seismic rating, the better the rack or cabinet’s ability to tolerate an event. For example, cabinets with a Zone 4 seismic rating can protect active equipment from damage during major earthquakes or other seismic activity.
Being aware of your environment—and what it takes to protect critical infrastructure—is crucial to make sure operations aren’t disrupted. In some jurisdictions, this also means complying with local codes and requirements.
For example, because they’re considered a Category 4 facility under the IBC, even hospitals that aren’t in seismic zones may require seismic cabinets. Some states may also have additional mandates for healthcare facilities, which could include requirements for seismic-rated cabinets, regardless of where the facility is located.
It's also important to note that requirements are ever-evolving. They can and do change over time. Consider the shift in best practices for ladder tray installations as an example.
In the past, ladder trays in hospitals were installed against the wall with L-brackets. During seismic activity, however, ladder trays sometimes place extreme force on the walls, causing damage. When this was discovered, ladder trays were installed a few inches from the walls to allow flexibility.
Today, however, installation standards once again call for ladder trays to be installed directly against the wall but anchored to structural supports like studs instead of drywall. This change ensures a safer, more stable installation while also preventing wall and cable damage.
Staying up to date on these best practices is important not only because it’s a requirement but also because it’s essential to protect your data.
What works well for one data center doesn’t work for all
From physical layout to climate, every data center operates within its own framework of variables. What works well in one facility might not be effective or recommended in another due to these differences.
For example, in Los Angeles and parts of San Francisco, distinctive requirements exist for cable pathway installation in hospitals to prevent swaying and damage that cause downtime. Here, instead of traditional wire, ceiling hangers must use threaded rods for greater stability. L-brackets must be securely anchored, with at least one inch of anchoring depth.
But carrying this practice to a state where tornadoes are likely may not be a great fit. Danger of flying debris is a critical factor: If threaded rods became dislodged, they could become projectiles in high-wind scenarios.
Making decisions based on your environment, possible risks and the likelihood of those risks occurring is a good way to ensure data center resilience and protect your data.
Discover how we can help you support mission-critical data center operations.