is impossible for ideas to compete in the marketplace if no forum for
BUSINESS RESUMPTION PLANNING:
Annual Vendor Fees
Site Preparation (1)
Initial Installation (2)
Annual Cost of Lines
Duplicate Operations Staff
Testing at Other Site
(1) 7 year amortization
(2) 5 year amortization
Step 1: Forecasted Annual Losses
The following analysis presents both the losses incurred by the organization during recovery of normal information technology processing, the losses incurred in winning back any lost customers, and reestablishing their service level reputation.
Step 2: Estimated Order Retention Rates
Figure 1: Projected Order Retention Rates, were derived from interviews with key marketing and management personnel. As a wholesaler, the organization believes their customers would switch to alternative suppliers for new orders within four days. This would be caused by the lack of inventory information and the delays in picking and shipping cause by reversion to a slow manual operation and a shortage of trained personnel. They also estimate that reorders of proprietary items, representing 65% of reorders, would continue, but reorders of generally available items would stop over a three week period. Figure 1, is derived in the spreadsheet included in Appendix A of Waldman .
Step 3: Estimated Order Rates
Figure 2: Projected Order Recovery Rate After A Disaster, presents the estimated rates of recovery of orders following an interruption due to lack of IT capability after a disaster. The key marketing and management personnel believe the firm could recover approximately 5% of their former order rate per month after return of full IT capability. However, recovery from the vendor hot/cold scenario is slower, since all lost orders were new orders. Figure 2, is also derived in the spreadsheet included in Appendix A of Waldman .
Step 4: Forecast Economic Impacts
Table 2: Economic Analysis of Scenarios, computes the impact of the scenarios from both theProbability and the Fiduciary Responsibility approach. The result of the analysis shown in Figure 2, is the estimated weeks of lost sales shown as the first data line in Table 2. From a fiduciary responsibility view, the scenarios have the following impact.
Fiduciary Responsibility Analysis Approach
The use of the following recommended fiduciary responsibility approach is illustrated for each of the defined scenarios.
Months after Data Center Down-Time
Average Order Retention Percentage
Hot Site Recovery
Cold Site Recovery
No Backup Recovery
Projected Order Recovery Rates
Dual Center Alternative
There are no losses associated with the dual center approach, since the impact of a disaster is not different from a routine interruption due to hardware, software, telecommunication, or utility failures. The costs of this alternative approximates 6% of the total IS budget, and 1.2% of the firms operating expenses. The firm, as typical of most distributors, did not believe the cost of dual data centers was worth saving a days down-time.
Vendor Hot/Cold Site Alternative
Losses from a disaster using this alternative, will be approximately $500,000. This represents, as shown on the third data line of Table 2, about 1% of annual profits. This level of loss is acceptable, given the minimal probability of a disaster to the data center. The cost of approximately $250,000 for this alternative is 2% of data center costs. This is the alternative selected by the firm, an action typical of most business organizations.
Own Cold Site Alternative
This approach would lead to an estimated loss of approximately $15,000,000 which represents 40% of annual profits. This has a major impact on the company. The management thought this type of loss would cause the board to totally replace management of the firm, and might result in selling it to a competitor. This level of loss was simply not acceptable.
No Backup Alternative
This approach would lead to an approximated loss of $30,000,000. This represents a loss of 70% of annual profits, which would be a disastrous impact on the company. The board would immediately have to sell the firm, or cease operations. This level of loss was totally unacceptable.
Probability Analysis Approach
The data for a probability analysis of the distributors data center contingency planning is included in Table 2.
Economic Analysis of Scenarios
Long Term Order Loss
Long Term Order Loss
Probability of Disaster
Annual Averted Loss
Return on Investment (ROI)
The result of a typical analysis of the data follows.
Dual Data Center Alternative
This alternative averts all loses, since recovery takes only a matter of hours or shifts. Using the typical one percent probability, the averted annual cost is approximately $270,000. This potential gain is balanced against the annual additional expenses of approximately $760,000, a negative ROI of almost 65%. This alternative would be considered impractical for firms with this level of down-time sensitivity.
Vendor Hot/Cold Site Alternative
The annualized allocated cost and annual expenses of this alternative are approximately equal. Therefore this alternative is a break-even option using the insurance based probability analysis based approach. From a study of the literature it appears that many other firms have also reached this conclusion. The popularity of this contingency planning alternative, as shown by the success of many firms in the backup site business, appears to be based on the decision that we can meet fiduciary duties without it costing anything (e.g.: a break-even low initial cost investment).
Own Cold Site Alternative
This alternative clearly leads to a significant negative ROI for this firm. This alternative is normally not authorized to use this approach, except when an old data center is available, thereby eliminating initial costs and creating a break-even situation.
No Backup Alternative
This alternative involves almost no expenditure, but leaves the organization open to potentially disastrous losses from loosing their data center. This alternatives popularity is probably based on the belief that their data center is well protected, and therefore will not be destroyed; as well as the reality that the manual information systems of the organization are in this same condition, and just as critical to the organizations operations.
Summary of BRP Case Study
This case study is not unusual, in that the two methods result in the same recommended decision. The fiduciary responsibility approach normally leads to selection of an acceptable backup plan, while the probability approach, as described in Ozier , may sometimes lead to the high risk-no backup approach.
"Threat events having a low-frequency, high-impact risk ... may have a low probability of loss that encourages management to take risks unduly."
This concern about possible high risk approaches is also illustrated by the case study described in Engemann & Miller [2, pp. 143].
"Finally, management felt that qualitative factors related to the marketplace reaction to a severe loss that resulted from inadequate contingency plans had to be factored into the analysis, even if such losses were eventually covered by insurance."
Contents of a Usable Business Resumption Plan
The critical elements in a usable BRP are the team organization and procedures needed to efficiently move to the backup location and resume productive work, and the backup facilities and equipment that can actually be used to perform the critical business functions affected by the disaster (Rosenthal and Himel .
Functions of BRP Teams
Most organizations with mature business resumption plans have a three tier BRP team organization structure including:
Top tier- Policy Group
Second tier - Disaster Management Team (DMT)
Third tier- Emergency Response Teams (ERT)
The top tier Policy Group consists of upper-level executives who are available for approving major DMT decisions involving customer service impact, major expenditures or major potential liabilities. For example, after the Bay Area earthquake a major bank opened their branches the next day without power and full cleanup and repairs. The ability to provide much needed cash to customers was deemed more important than the potential for accidents or robberies.
The middle tier DMT includes representatives of key departments and functions involved in life-safety and business contingency planning. The following table lists the functional organizations often represented on a DMT.
Selecting the chairperson of the DMT is often a difficult and politically sensitive decision. The pressure to appoint a senior executive should be resisted. Senior executives belong in the Policy Group among their peers. The chair of the DMT, and therefore the coordinator of the EOC, should be an extremely knowledgeable peer of the other members of the DMT. The chair should not however, be associated with any ERT. The chair is frequently the supervisor of the Project Head, Business Continuity Planning.
The third tier consists of a large number of Emergency Response Teams (ERT). For example, the data processing area might have specialized logistics, backup data center operations, network operations, and user support ERTs. The safety area might include a dozen or more ERTs with first aid and evacuation responsibilities, each headed by a floor warden.
Functions of a Policy Group
The responsibility of the policy group is to authorize out of the ordinary expenditures required by emergency operations, as well as to set policies primarily impacting stockholders and the public.
They must take the time to carefully consider the long term impact of the operational decisions being made by the DMT and the ERTs. Therefore, the team is made up of a variety of company executive including legal, public relations, human relations, and financial experts; and is normally the only team not staffed completely by personnel with primarily day-to-day operations responsibilities.
Functions of a Disaster Management Team (DMT)
During a disaster the DMT has three primary functions:
Coordinating the efforts of emergency response teams to assure the safety of personnel and to minimize the damage to their facilities following a disaster. A life-safety DMT normally is organized for every major facility or campus.
Business Continuity Planning
Planning and coordinating emergency operations and restoration of normal operations following a disaster. A business continuity DMT is normally responsible for a total business unit, frequently involving multiple and wide-spread facilities.
Operating the EOC
The DMT performs its functions from the organizations Emergency Operations Centers (EOC). EOCs observed by the author are of two basic structural types: the single conference room approach, and the dual room approach.
Conference Room EOC Approach
The most common and least expensive approach is the converted conference room. Large conference rooms at two or more widely separated locations are converted to EOCs. Furnishing and equipment required include:
Telephone consoles for each participant; including an EOC rotary line, a dedicated incoming line for each function, and a line for outgoing calls.
TVS and radios to monitor news and public announcements.
White boards, tack boards, and flip charts.
Facility maps and area maps with medical and emergency service facilities identified.
Multiple radios with multiple channels for use in communicating with emergency response teams and the outside world. At least one of the EOCs should house a portable satellite communication unit.
Room power connected to the building's emergency power system.
Food, water, and rest facilities for primary and alternate DMT members.
California firms often have Los Angeles and San Francisco EOCs and DMTs because of the possibility of an area wide disaster due to a major earthquake. Other areas of the world may not need this much separation between locations.
Dual Room EOC Approach
The dual room EOC approach provides contiguous space for both the Policy and DMT. A glass wall between the two rooms permits the Policy group to monitor DMT activities and observe status boards and displays. Parallel decision making is enhanced permitting continuous emergency operations control while significant policy decisions are being made.
The dual room EOC is normally used by organizations with frequent operational emergencies, such as utilities exposed to power outages or pipeline breaks. The EOC is used for both operational emergencies and for disasters affecting non-operational facilities and personnel. A second conference room type EOC is also normally available at a site remote from the primary EOC.
EOC testing involves two functions: a periodic walk-through of all equipment by the Project Head- Business Continuity Planning, and periodically performing DMT simulations in the EOC.
Functions of Emergency Response Teams (ERT)
The activities of emergency response teams following an emergency must be closely coordinated and adapt swiftly to the type of disaster and its evolving impacts. Emergency response teams can include such areas as: policy, emergency operations center management (DMT), facility acquisition and management, site and equipment recovery, backup data center operations, logistics and transportation, off-site storage coordination, floor wardens, assembly site coordinators, public/employee communications, telecommunications, finance and insurance, etc.
Staffing these teams is a significant problem. Each of the functions of the team (including team leadership and around the clock coverage) must have a primary and backup person assigned.
Work locations must be assigned and intra-team and inter-team communications planning must be assured. Some typical problems follow. Does your plan really define to whom the responsibility to handle each problem has been assigned.
Who has the authority to declare a disaster and authorize expenditure of funds?
Who decides what to tell BRP team members, other employees, customers, and the media?
Is there an inventory of available space and equipment?
Have all business functions been prioritized so that the facility acquisition and management team can quickly assign space to displaced organizations?
Are the teams staffed and lead by persons with the day-to-day operations knowledge required for effective emergency operations?
Are all sites stocked with emergency food, water, medical supplies, and other equipment need following a disaster?
Are realistic life-safety and assembly drills periodically conducted at all sites?
Are there adequate security arrangements for damaged or evacuated sites?
Is there a HELP desk planned with sufficient telephone capacity to properly forward calls from media, employees, family of employees, BRP team members, customers, and suppliers?
Are the auditors assuring that up-to-date copies of all critical records and data are stored off-site at a secure facility?
Do you really know what your insurance coverage is for damage, injuries, and business interruption?
Is there an organization responsible for assuring that all business functions and locations have developed a realistic BRP and is adequately testing both the operational and management aspects of the plan?
The determination of emergency teams functions and reporting structure is dependent on individual firm and site characteristics. The team structure described is typical of a large operational facility housing several clerical organizations and a major data center with a distant commercial backup data center.
Operations Center BRP Teams
Damage Assessment and Recovery Coordination Team
This team evaluates the extent of damage to the facility and informs the DMT of the estimated time required to rebuild the damaged facility. The team then assumes the responsibility for restoring the current facility or creating a new facility.
Public/Employee Relations and Communications Team
This team consists personnel and public relations staff with responsibility for collecting information on the status of operations, facilities and personnel and communicating relevant information to the media, employees, customers and vendors.
Operations Coordination (Help and Scheduling) Team
This team is made up of representatives from each of the functions occupying the damaged facility as well as members from each data processing application support group impacted by the disaster. Their role is to schedule and coordinate initial and continuing emergency operations.
Administrative Support Team
Responsibilities include providing emergency cash and payments, physical security at damaged and backup sites, commuting & lodging support, handling insurance claims, and keeping records of emergency costs & expenditures.
Operations Center Life/Safety Teams
These teams are responsible for personnel evacuation or lodging following a localized or area-wide disaster. They often include:
Staffed by volunteer employees trained in first aid and in evacuation methods. Responsible for coordinating evacuation or lodging of occupants in a specific floor or area, as well as performing first-air and communicating with the EOC.
Facility Management Team
Staffed by physical plant operations personnel. Responsible for operating or shutting down the facility after a disaster.
Physical Security Team
Responsible for maintaining security at damaged and at temporary locations.
Information Systems Emergency Operations Teams and Positions
These types of teams are responsible for business resumption of critical functions occupying the impacted facility. The data center emergency operations teams described are also typical of the type of teams and positions often needed by other functions occupying a typical operations center.
IS BRP Coordinator
Responsible for coordinating the IS recovery and supervising all other IS BRP teams. Normally is a member of the DMT and is located in the EOC.
IS Backup Center Operations Team
Responsibilities include computer, data communications, and peripheral operations; establishing the data processing schedule during catchup; disseminating processing output; and providing the Operations Coordination Team with timely status reports.
IS Logistics & Supplies Team
Responsibilities include transportation, courier, shipping & receiving, and library & warehousing during emergency operations. This includes retrieval of data, software, and documentation from off-site locations.
IS Operations Support Team
This multi-discipline team's responsibility is to support emergency IS operations. Staff includes technical (systems software), applications development, and data & voice communications support professional personnel.
IS Specialized Resources Operation Teams
These teams interface or operate sites with specialized IS resources, such as page printers, micro graphics, and check sorters.
Data Center Backup Architectures
An organization's IS architecture must assure near continuous availability of both data centers and telecommunication networks. Both internal and external resources are available to offer the backup resources needed to assure the high availability required by most business resumption policies.
Data Center Backup Approaches
There are three major approaches to Information Systems (IS) Architectures for protecting critical IS processing from interruptions or disasters. They include the use of a commercial backup data center, the use of multiple in-house data centers, and the distribution of processing to multiple user locations (Rosenthal, 1994).
Using commercial backup data centers
Commercial backup data centers offer facilities that permit reactivation of critical processing within 24-36 hours using their hot site, and reactivation of non-critical processing within 1-2 weeks using their cold site. Organizations with a single data center that can tolerate this type of delay find the use of a commercial backup site both cost effective and practical.
Using multiple in-house data centers
Organizations with a small number (normally two to four) of large decentralized data center locations can often use, within 12-24 hours, development and non-critical processing capacity as backup hot site resources. Rapid upgrading of equipment can be implemented in place of a cold site.
A HIGH PROTECTION BACKUP ARCHITECTURE
A typical architecture for dual data centers using electronic archival is shown in Figure 3. The production data center normally will contain an online and an information center (MIS/DSS) system. The backup (development) center would then contain the development system and space to quickly add an additional system. Recovery after a disaster or major interruption at the production center consists of posting today's transactions from the log tape and activating communication lines terminating at the backup (development) center.
The problem in using multiple in-house data centers to backup each other, is in maintaining compatible configurations and systems software versions. Very rigid centralized control of data center configurations and standards is required.
Using a distributed processing architecture
Many organizations have dozens to hundreds of similar function facilities. When a data center suffers a disaster, the total facility that it supports is normally also affected. The BRP policy is frequently to shut the facility until repaired, and transfer operations to neighboring locations.
TYPICAL DUAL DATA CENTER BRP ARCHITECTURE
Telecommunication Network Backup Approaches
Historically, many organizations have leased voice grade multi-drop telephone lines to support an individual application's data communication requirements. Implementing BRP for networks of multi-drop data lines is often performed by adding an additional drop at the backup data center to each line. When this approach is infeasible because of the distance to the backup center, the dial backup capability of their modems is used to connect both to their data center in the event of a line outage and to the backup center in the event of a disaster.
The recent availability of inexpensive multiplexes and concentrators, and of a wide variety of cost effective high speed lines, has increased the use of trunk connections linking multiple user locations to their data center. Multiple user locations are now being interconnected to data centers through a high speed backbone network that requires a high level of protection from interruptions and disasters. There are two major approaches to assuring high levels of availability for these backbone telecommunication networks. They include building redundancy into the network and/or using switched digital circuits from a common carrier.
Using telecommunications network redundancy
High speed trunk oriented data networks based on regional or major site controllers should be configured to include route redundancy. The redundancy is valuable, not just for BRP purposes, but also to handle anticipated load variations and to permit maintenance of equipment and circuits without interrupting service.
Using a common carrier's switched broadband circuits
All of the commercial backup data centers have switched circuit capability for connecting the backup center to a customers regional or site communication controllers. In under an hour, several common carriers can reconfigure a clients network, switching the client data center out of the network and the backup center into the network.
An example of a network using dial backup, network redundancy, and switched broadband circuits is shown in Figure 4. Remote sites are connected to regional concentrators with multiple routes to the data center. These concentrators also have switched broadband capability to connect to the backup center after BRP purposes. Sites or terminals close to the data center have voice grade dial backup capability to reach the backup center.
The economics of implementing this type of backbone architecture as part of a BRP program is very favorable. Broadband digital links are highly reliable and starting to be priced at rates highly competitive with multi-drop voice grade lines. Many firms have achieved slight reductions in cost by consolidating their various application oriented networks while simultaneously adding redundancy and/or switched capability to meet BRP requirements.
Manual System Backup Approaches
Backup methods for manual records/information systems tend to be expensive and to utilize specialized equipment; or are not very safe. This problem may explain, but it does not excuse the lack of effective BRP arrangements for most critical manual systems. The following types of backup methods are only representative of the multitude of architectures available when creative managers are faced with the executive demand for a realistic BRP for all critical business functions. These various backup methods can be categorized based on if the manual processing will continue to be performed on paper or by using other media (primarily micrographic or IT image systems).
Paper-Based Processing Backup Alternatives
Paper based processing seldom survives a quality business process reengineering (BPR) study. However, a BPR is seldom performed unless extensive automation has already occurred in that business unit. Therefore, the following approachs are the most common result of a demand for a BRP.
Secure/Fire-Proof File Room or Safe
Only currently being used paper records are to be removed from the file room/safe. This approach gives good protection during non-working hours. However, paper records are seldom removed and returned individually, because of the inefficiencies involved. Also in the event of a fire, earthquake, bomb scare etc., staff do not return current records to the secure area and, in fact, seldom close the rooms/safes. These approaches give only fair protection, and should not pass audit when the records are critical to the survival of the organization.
Off-Site Storage of Micrographic Copies of Records
Few business processes do not update the majority of records accessed. This approach, therefore, is seldom used. It is however, very effective and safe when feasible.
Archiving Off-Site the Original Paper Records and Transactions
This type of processing involves the use of non-computer based storage for processing media.
The most common types of media are microfilm/microfiche and image mass storage systems.
Micrographic media for use in processing is very common when most activity is requests for information, and all actions generate new records that can be filmed and archived. This approach, when applicable, is very effective and safe.
Image Systems are normally used for the same type of applications as micrographic media. The can , however, automatically index new transactions affecting a master record. This permits their use in more applications than micrographic systems. This approach, when applicable, is also very effective and safe
Business resumption planning should be an integrated portion of a total security program. The security program should cover physical security of facilities and equipment, data security of automated files and manual records, protection of all levels of personnel, and business resumption planning. Business resumption planning needs to be an integral part of doing business. For example, IBM internal policy -as stated in their Corporate Disaster Recovery Planning Standard (Policy Number 209)- directs all operating and staff units of the company to develop plans for any emergency that results in either a significant loss of assets or revenue flow, or renders the organization unit unable to meet customer commitments or protect the interests of stockholders and employees.
Executives of all organizations have a fiduciary responsibility to take prudent steps to assure the survival of their organization following a natural or man-made disaster. Providing the necessary funds and leadership for a quality business resumption planning program for all critical business functions, both IS and manually oriented, is a key portion of that responsibility.
1. Andrews, W.C. "Contingency Planning for Physical Disasters", Journal of Systems Management, 41:7, 28-32, July 1990.
A short but comprehensive description of the why and how of justifying and producing a data center BRP.
2. Engemann, Kurt J., and Holmes E. Miller. "Operations Risk Management at a Major Bank," Interfaces, 22:6 : 140-49, November-December 1992.
Presents a decision analysis framework for making risk management decisions.
3. Lamond, B.J. "An Auditing Approach to Disaster Recovery", Internal Auditor, 47:5, 38-48, October 1990.
A survey of the DRP preparation cycle including an introduction to operational testing and plan maintenance.
4. Metzger, Michael B., et al. Business Law and the Regulatory Environment: Concepts and Cases. Chicago: Richard D. Irwin, Inc.: 867-69, 1995.
This book defines the duty of care and fiduciary responsibility of officers and directors of corporations. It states that the Model Business Corporation Act requires officers to act in good faith, and with the care a prudent person, in a like position, would exercise under similar circumstances, as well in a manner they reasonably believe to be best interest of the corporations.
5. Ozier, Will. "Issues in Quantitative vs. Qualitative Risk Analysis," Managing IT/IT Solutions. Delran: Datapro Information Services Group, report 6055 (1994): 1-7.
A detailed comparison of the quantitative (probability) and qualitative (fiduciary responsibility) approaches and their impact on managerial decisions.
6. Rohde, R. and Haskett, J. "Disaster Recovery Planning for Academic Computing Centers", Communications of the ACM, 33:652-657, 1990.
A step by step description of producing a BRP for a university data center.
7. Rosenthal, P. "The Emerging Enterprise Systems Architecture", Journal of Systems Management, 45:2;16-21, February 1994.
8. Rosenthal, P. and Himel, B. "Business Resumption Planning: Exercising Your Emergency Response Teams", Computers & Security, 10:497-514, 1991.
A detailed description of a data center disaster plans simulation testing including a complete script of an actual exercise.
9. Rosenthal, P, and Sheiniuk, G. "Exercising the Business Disaster Team", Journal of Systems Management, 38:4;12-16 & 38-42, 1993.
A detailed description of a business continuity and life-safety disaster plans simulation testing including a complete script of an actual exercise.
10. Waldman, Jan I. A Methodology for Justification of Business Resumption Planning Based on Fiduciary Responsibility Considerations, Unpublished masters thesis, California State University, Los Angeles, 1995.
A detailed description, with examples, of the use of the prudent person BRP justification approach.
11. Wong, K. K. Risk Analysis and Control - A Guide for DP Managers, Hayden Book Company Inc., 1997.
This classic presentation of the quantitative approach to risk analysis. Contains a description of the statistical, IBM, and NCC [National Computing Center] approaches to risk evaluation, as well as a good description of risk control.
Contact the Author: [email protected]
Paul H. Rosenthal is a Professor of Information Systems at California State University, Los Angeles. Dr. Rosenthal teaches a wide variety of courses encompassing information systems technology, management, political economy, and systems audit and assessment He received a BS in Ed and an MA in Applied Mathematics from Temple University, an MBA from UCLA, and a DBA from USC. Prior to joining CSULA, he spent thirty six years in industry as a professional, a manager, and as a consultant. His recent research interests involve business continuity management, IS/IT education assessment, IS/IT Infrastructure Planning, and Technology Systems Assessment.
© Copyright The Business Forum Institute - 1982 - 2015 ** All rights reserved.
The Business Forum Institute is not responsible for the content of external sites.