March/April 2002 - Volume 81, Number 2
by Gary MorrisIn Houston, clean up took months after Tropical Storm Allison, which hit the area last June causing severe flood damage to many homes and businesses. The extensive damage put some businesses on hold for two months or more while rebuilding and repairs took place. Some businesses are still trying to recover.
A piece of equipment used in the office to power computer systems may not seem to have much in common with a natural disaster like Allison. But, they have more in common than one might think.
Both can have disastrous effects on businesses. When companies design business resumption plans, they typically create plans for floods, fires, earthquakes and bombs. They often neglect to consider equipment failure as a potential disaster, even though businesses have a much greater chance of losing a server or experiencing back-up failure, which can result in significant downtime.
Downtime can result in both lost revenue and lost opportunities, which can be irreplaceable and damaging to a company’s bottom line. Even a hardware failure or widespread virus could be classified as a disaster. If one of these disasters hit your business, would you be able to maintain operations and continue to serve your customers?
Companies need to ask if it is worth risking potential revenue loss in exchange for the few hundred dollars it would cost to put in a fan, power supply, uninterrupted power supply, or other piece of inexpensive equipment that could provide the redundancy required to keep things up and running. You can add a tremendous amount of redundancy to your system for very little money.
When it comes to planning for disasters, there are two main scenarios: (1) natural disasters, such as tornados, floods and earthquakes; and (2) smaller disasters localized to one office, such as virus influx or server damage. Whether planning for large or small disasters, offices should consider some key best practices.
When planning, some basic questions should be posed to determine how much manpower an office needs to function.
| • | What equipment and/or supplies does each department need to keep running? |
| • | What is the minimum number of people each department needs to function? |
| • | How long can each department function with minimal staff? |
In addition, all businesses have critical infrastructure that is vital to operations, whether a full-blown data center or just one critical server. Properly built and maintained systems have a much better chance of staying up and running during times of crisis.
Keeping systems running is two-thirds of the battle. It all starts with core infrastructure.
When building the infra-structure, there are some vital items that can be included to increase the chances of uninterrupted service.
| • | Uninterrupted Power Supplies (UPS) are back-up power supplies that go into action if the building loses power. If a feed fails, a UPS keeps systems up for a certain period of time to allow for a transfer of power or a graceful shutdown of systems. They also compensate for spikes and dips in service. |
At Landata Systems, if a power feed fails in our data center, our UPSs ensure that systems stay up while an Automatic Transfer Override switches us to a new feed. In the unlikely instance of complete power failure, the UPSs ensure that the center maintains power for a limited period of time, which gives the support team time to perform an orderly shutdown of the servers.
| • | Back-up systems must be current and verifiable. Tapes should be stored in a safe, off-site location and the system should be tested regularly to be sure it is backing up properly. Recovery plans should also be tested regularly. We verify the success of back-ups on a daily basis and run back-up recovery tests on critical systems at least quarterly to be sure our servers and back-ups are working property. We also want to be sure our people know exactly what to do so they’re prepared if a disaster does strike and we have to recover an entire server from tape. |
| • | Store original media for critical applications in a safe, off-site location so servers can be rebuilt, if necessary. |
| • | Virus protection plans must be updated regularly and strictly enforced to ensure one severe virus doesn’t wipe out an entire system. |
| • | Keep a complete and reliable inventory of servers and equipment that can be referenced if a system must be rebuilt. Be sure to document the configuration of machines, include a list of vendor contact information and record whether service agreements exist with each company. |
| • | Create detailed, disaster recovery plans specific to certain pieces of critical equipment. |
| • | Purchase and implement quality equipment and software for your infrastructure. |
| • | A large part of business resumption planning is keeping everything up in the first place, which means building it right. Be sure all equipment is supportable before you implement it. The same thing goes for software, don’t buy software if there is a chance you won’t be able to get support for it. |
| • | Build as much redundancy in servers as possible. For example, have redundant hard drives, power supplies and other critical equipment available for immediate back-up. |
| • | In our data center, we have implemented N+1 redundancy for critical equipment. For every piece of critical equipment, we have at least one extra. For example, if we need six air conditioners to maintain the center, we have seven, in case something goes wrong with one. This redundancy means that if something fails, we always have a back-up to keep the system running while we fix the problem. |
| • | Be sure your facility is appropriate for your systems. For example, you must have clean, conditioned power and cool air to ensure equipment longevity. Maintaining a clean operating environment is equally important. Dust and other contaminants can greatly shorten the life expectancy of sensitive computer equipment. |
Whether you are a large organization or a small one, business resumption plans are vital to your business. And in light of recent events, business resumption plans are no longer a luxury, they are a necessity.
Gary Morris is vice president of the technical services division for Landata Systems, Inc., a subsidiary of Stewart Information Services Corp. He can be reached at gmorris@landata.com or 713-871-9222.
How To Find Us: