Are You Ready to Face an Information Systems Disaster?
Predictive Systems’ Roundtable on Disaster Recovery

By Thomas Boudrot
Predictive Systems

----------
Every year, companies lose billions of dollars when disasters shut down the core activities supported by their information systems. Some companies never survive the shock. In fact, of the 350 businesses in the World Trade Center before the bombing in 1993, 150 were out of business a year later. One might only guess as to the statistics on the first anniversary of the September 11th disaster.

Predictive Systems recently conducted a disaster recovery roundtable to address this critical business strategy. We asked experts in distinct areas – network design and engineering, security, and storage – their thoughts and recommendations about business continuity and disaster recovery planning. In this article, our experts tackle some tough questions facing American businesses.

----------
Roundtable Experts
Peter Browne – Vice President, Predictive Systems' Global Integrity business unit. Peter has over 30 years as an information security practitioner, consultant, speaker, and author. His extensive experience includes disaster recovery responsibility for major firms such as First Union, GE, and Motorola.

Ernie Hassell – Managing Consultant, Information Security. Ernie has over 15 years of IT service experience, most recently leading the disaster recovery practice at Lucent Technology Worldwide Services.

Tim Rooker – Managing Consultant, Network Design and Engineering. Tim has over 14 years of experience in network communications’s with such companies as BellSouth, MCI, and ING North America. Tim has previously performed disaster recovery planning and design for BellSouth.

Scott Wilson - Vice President, Network and Systems Management Strategy. Scott has been in the network management field for six years, delivering large-scale network management solutions to a variety of customers including large Internet service providers. Scott has extensive experience in storage solutions.

----------
Moderator: With regard to business continuity and disaster recovery, what have been the biggest lessons learned by American businesses because of the September 11th disaster?

Hassell: One thing is that there was a real gap with respect to knowledge transfer at different levels of many organizations impacted by the 9/11 disaster. For example, within the executive management level, let’s say CFO types, it was critically important to have arrangements for funds disbursement and other processes that keep the wheels of a business turning. In many instances, things broke down when there wasn’t a plan in place when the CFO wasn’t around. There was no transfer of critical knowledge.

Another thing realized was that the end points of many companies’ infrastructures weren’t really known. In other words, disaster recovery planning wasn’t fully done. So, companies didn’t know how many nodes, desktops, what applications, which resulted in quite a bit of chaos.

Browne: Generally, people will be able to plan for things they know. Nobody had any experience with the likes of the situation on September 11th. The 30-year rule comes into play. What do I mean by that? If you look at most people in their careers, they don’t last more than 30 years in business. They didn’t have any knowledge base for this type of disaster. If you look at what Gartner says, of things that go wrong, some 40% of them are operator error, 12% hardware, and 40% application failure. Only 5% of things that go wrong are true disasters. Things like major snowstorms, hurricanes, tornadoes, and tsunamis are considered disasters, but they’re experienced by quite a few people. This disaster wasn’t in anybody’s frame of reference.

Wilson: Storage is a hot topic in New York City right now. I’ve heard so many times of companies with disaster recovery processes documented, only to find that nobody knew where the document was. Another big problem, with regard to data, was that many key employees had data stored on their laptops or desktops, but they were never replicated anywhere. A significant amount of data was lost. Many disaster recovery plans focused on power outages in a central data center, but never considered the loss of an entire facility or disbursed data that was never collected into a central location in the first place.

Rooker: As far as the lessons learned, I believe that many people now consider disaster recovery as an insurance policy. That’s significant. Before 9/11, a disaster was considered to be a hardware outage or something similar. Nobody could ever predict a scenario of this magnitude. Everything is literally being re-written as we sit here.

Moderator: It appears that business continuity/disaster recovery is a “business” state of mind. I’ve uncovered three prevalent scenarios in my research. The first one is, “Don’t bother me with this since it’s so remote it will never happen.” Another is, “Sure, we’ll take care of that,” though nobody is actually paying any attention to the details. The third scenario is, “We’ve got the latest and greatest systems in place and we’ve spent a fortune on them.” What can you share about these scenarios from your experiences?

Browne: We worked with a large financial services company on a business continuity plan. The first thing we did was to look at their products and services and we uncovered some interesting attributes. First, we found certain services didn’t make any money. When we identified critical services that had financial meaning, we went off-site and did planning for three days. This doesn’t fit any of your scenarios since this seems to be one of the correct ways to tackle the problem.

One interesting point here. On the evening of the first day, we were talking about how to deal with emergencies and how to move employees to alternate sites if a given physical site was out of commission. At 6:00 PM that night, a call came in saying that the headquarters was burning. Actually, there was a fire two floors above their office and the place was drenched and filled with smoke. The client actually had to exercise their disaster recovery plan as it was being created.

Moderator: How did they do?

Browne: They actually did very well, because it was fresh in their minds. We got people to move to alternate locations. Things went very smoothly. This made believers of the executive team in the need and the power behind the process. The company is very proactive to this day in the whole world of business continuity and disaster recovery.

I know of hundreds of cases where too many more important things were on the front burner and they looked at business continuity planning as a discretionary audit. Basically, they’re doing lip service and this seems to be the state of mind for many businesses today. So, your second scenario is more prevalent than we’d want to admit.

Moderator: Is this because you can’t immediately justify the whole disaster recovery process to the bottom line?

Browne: That’s right.

Rooker: It’s almost “out of sight, out of mind.” A specific painful experience, as Peter described, has a significant impact on a business culture. But, for many companies, it’s an abstract exercise to go through. If they don’t have any direct experience, it’s just an exercise. Unfortunately, that’s a dangerous mindset.

Hassell: A positive point coming out of this horrible incident is the fact that the insurance firms are clamping down on companies, making sure that they have disaster recovery plans in place to take care of business interruption. They simply won’t write policies anymore, since the losses were so tremendous. Insurance companies want to mitigate their risks, and one way to do that is through a required disaster recovery plan.

Moderator: Insurance companies may be the police that demand that a disaster recovery plan is in place. But, who blows the whistle if the plan is inadequate?

Hassell: I think insurance companies are viewing themselves as stakeholders that should be involved in the process through onsite involvement. If done correctly, the process will actually blow the whistle.

Moderator: Does the level of planning differ depending upon the type of company? For example, would a plan for a financial services company be significantly different from a mail order company?

Rooker: Plans do differ based on the type of business. In the financial services industry, for example, what is the level of outage that the business can endure? For financial services, the answer is, “none at all.” The disaster recovery plan has to be rock solid. This requires a lot of effort, a lot of money, and a lot of due diligence.

Moderator: So it’s really about the pain and suffering of financial loss?

Rooker: That’s the beginning. But, with a financial services company, you have to also look at perception. It may be a little less tangible, but no less catastrophic to the company if people perceive the company as unstable. Perception plays a major role in companies such as financial services, high-tech security, etc. A complete outage will do irreparable damage. Degraded service, on the other hand, where the company limps along until things are fully restored, might do in the short term. If the public and regulatory agencies will accept that, then the pain and suffering threshold is better known.

Browne: In any type of business, there are pain points based on the kind of business, the transaction velocity, and things like that. If a company is doing real-time or near real-time transactions, the pain threshold is lower. It doesn’t make any difference what kind of business you’re in. It depends on whether you are relegated to choke points or funnel points in your business.

Moderator: This may be a naïve question, but can disaster recovery planning and implementation be measured in terms of return on investment (ROI)?

Browne: Absolutely. That’s what sells it in most cases. Doing the risk analysis that says, “If this happens, then you lose that,” is critical. There is direct loss, and then there is indirect loss -- the cost of recovering, the cost of rebuilding, etc. And, there’s consequential loss, such as lost revenue, or a loss of reputation, or a loss of business. Forty percent of companies go out of business in less than five years after a disaster. Those statistics are immutable. Our core competence is the ability to measure and quantify risk.

Moderator: I’ve always thought ROI was a cause/effect situation. You can say we spend X dollars on getting this achieved. But, you can’t say we saved X dollars until the disaster occurs. If it’s never called into play, isn’t it considered a loss?

Hassell: That shortsightedness is exactly the sort of situation companies catch themselves in. However, if executed properly, a disaster recovery plan should identify things on the back end that were not visible on the front end. Just the details, such as knowing where the equipment is, having an up-to-date asset list, or as Peter pointed out, what pieces of the business aren’t making money. These are things many companies haven’t accurately captured. A good plan will catch that, and save countless dollars in loss when a disaster occurs. It’s on the back end – a streamlined process and a better handle on what it is you have to do. That’s the name of the game.

Browne: This is a form of business process engineering. Businesses that go through the process find themselves more efficient. By the way, what is disaster recovery planning? Asset management and communication. Guess what? Those are sweet spots. You’ve got to get better in both cases to be really good at what you do.

Moderator: When a disaster recovery planning team comes forward and says we need X amount of dollars for certain redundancies, remote locations, etc., to safeguard the business, that becomes very expensive. I guess what I’m saying is, businesses don’t mind doing the planning, but they faint over the cost of actually implementing it.

Wilson: Exactly. You wind up looking at a variety of scenarios that a company just can’t afford to enact. That brings us back to the notion of how much pain and suffering the company can endure and still remain in business. That’s where you draw the line.

Rooker: There are solutions and then there are less expensive solutions. If you know what the potential loss is, then you know how much money you can afford to spend in preventing it. That’s called making an informed business decision. So, in this instance, you can determine what form of redundancy is important to keep the business viable.

Moderator: Consider this: A small company that has the need for a disaster recovery plan. The person responsible for that is just that: one person. Is disaster recovery planning too big for one person to handle?

Browne: I consulted for a small company with exactly that dilemma. And that probably answers the question. They saw the need for an outside eye to catch the flaws in the logic to ensure that every aspect had been considered. I guess putting all your eggs in one basket is something a company, even a small one, should not do.

Wilson: Sometimes it takes a third party to catch the obvious. Yes, a consultant will see it as his or her obligation to alert a company to various problems because it’s the consultant’s reputation on the line. That’s a healthy thing. However, even large companies can benefit from the outside assistance.

Moderator: The business continuity process appears to follow four major steps: first, a company has to outline the scenarios that would interrupt business; then, plan reasonable continuity to withstand the scenarios; third, to test them; and last, to put the tests into place once they’re found to be adequate. Do I have it right?

Hassell: The DRI —Disaster Recovery Institute—is the grandparent of disaster recovery planning. It’s the official body that oversees the certification of this kind of planning. They’ve laid out a solution portfolio and it talks to this assessment process. First, they look at risk assessment and business impact analysis. Strategic planning is then the result of the analysis. Third, they must get buy-in for plan development and implementation. The last step is plan maintenance.

Moderator: Is there any particular part of the process you’d like to comment on?

Hassell: As a precursor to entering this process, it is critically important, and of great benefit to a consultant, to have a policy in place with respect to disaster recovery. Things get far more difficult if that policy doesn’t exist. But, if it doesn’t, a consultant can go in to help ratchet up the need. Then, he or she can strive to be the champion. Even a simple three-line policy gets the ball rolling.

Controlled risk needs to be separated from uncontrolled risk. Controls need to be in place in the physical sense. For example, you can look at a guard outside the door and security cameras as physical representations that mitigate risk. Now, when we look at the anthrax threat, we’re looking at an uncontrolled risk. Security and risk analysis go hand in hand.

Rooker: He’s absolutely correct. With regard to the network, you apply analysis starting with components internal to the network. Looking at the core and its physical nature, you may have hardware failures; you may have application failures, etc. Then, you work your way up to the next level, which may be the infrastructure, building security, things like that. Peter may have a statistic on this, but a very high portion of outage involves “the last mile” of a network. One example might be a local exchange carrier carrying the signal to the building. You could have an internal network that’s rock solid but the local exchange carrier facilities, servicing your location, may only be available as a single feed to the building. Another area of concern may be the redundancy/diversity for the central office serving your location. If you’ve got construction up the street and a construction worker is using a jackhammer, guess what? You’re history. You have to determine how far out you need to take the analysis.

Browne: My experience has been that a majority of outages have been last mile outages.

Rooker: One extreme example of such an outage is the story of a farmer in Alabama burying a cow with a backhoe that hit a fiber link and took out a majority of the East coast. We have seen the enemy and it is a backhoe!

Moderator: In terms of storage, companies can opt to outsource it to a storage service provider to ensure that data are in a different physical location, or use them as a redundant facility. But there are caveats to that. Many of these providers are going belly up and giving three days notice that you have to get your data out or it’s toast.

Wilson: A leading hosting provider is exactly in that situation since it filed Chapter 11 last month. You have to ask the fundamental question, “How critical is that data to my business and am I going to trust it to somebody else?” Most companies aren’t that comfortable with this particular market. If it’s less mission critical, or you can’t afford to build one of these solutions yourself, you still have to do due diligence to ensure that the storage service provider has technical and financial viability. Then, you have to ask yourself what your contingency plan is if they do go out of business. These days, companies are requiring legal clauses that ensure conversion facilities to another provider in the event that the provider goes out of business. It’s sad, but we have to anticipate that these companies might go belly up. This requires cash reserves on the provider’s part to ensure that the work gets done. We’re back to the concept of an insurance policy.

Moderator: Is there anything else we can highlight here with regard to storage or other services?

Hassell: Internet hosting centers is a similar example. An e-business that uses an outsourced hosting facility would be crazy not to have a contingency clause built into their contract with the hosting center.

Browne: This applies any time you outsource a key component of your business. There needs to be an audit mentality applied to these situations. Contracts need to acknowledge back-up plans and the like.

Moderator: Are there any final comments any of you would like to make?

Browne: Recovery planning is increasingly part of the business life cycle. When you think about recoverability at the very beginning of a life cycle -- the requirements, architecture, design, construction, testing, implementation, and the post-mortem -- then a company is doing the right level of risk management. This is what we help companies do. Recoverability is a piece of the overall business concept.

One key point: as companies move into the so-called e-commerce space, speed is a critical variable. It used to be that you could do disaster recovery in 72 hours and satisfy the auditors. In recent years, it has shifted to 24 hours, then down to four hours. Today it may be that you have to recover in minutes. If companies understand this point and apply the right risk management to it, then they can identify the right technology to recover successfully.

Rooker: A buy-in from a company’s management team is extremely important. It can’t be underestimated in the process of disaster recovery planning and implementation.

Wilson: Companies forget to look at the entire business process in their disaster recovery planning. You have to look at all your partners, suppliers, distribution point or channels, whatever. You can have your own systems covered, but if a critical piece of your business lies in someone else’s shop, you’ve got to ensure they’re covered as well. The plan needs to be very comprehensive even outside of your space. A case in point: I don’t think anybody in upper Manhattan expected to lose phone service from an event downtown. But they did. This will probably spawn other technologies, including voice over IP as redundant communications lines. Phone service was one service that was taken for granted.

Hassell: Recovery time, which relates to the pain threshold concept discussed earlier, brings in financial impact into all of this. Each company is going to have to look at resumption, recovery and restoration and place them on a scale with dollars and time.


© Copyright 2002 Predictive Systems, Inc. All Rights Reserved.

Please explore our other menu options for more information on NetConnX and the services that we provide.
NetConnX 888-411-1699 24hrs

Copyright © 1996 - 2003 NetConnX Technologies