Disaster Recovery Plan
Provide an overview of the organization that will be delivered to senior management, defining the business goals and objectives and the size, layout, and structure of the organization. TechWidgets Inc., is an e-commerce company that provides merchandise to its customers through a web store. The core infrastructure is made up of 10 web servers in a single cluster to handle browsing requests, 5 servers in the web store clusters to hand transactions and processing and 5 a data cluster stored on a storage area network (SAN). The core network is connected to the internet via 2 high speed connections (T-3) from two different providers. This infrastructure is replicated in the organization’s alternate hot site for immediate failover in the event of a disaster to prevent any unscheduled downtime as well as being able to appropriately load balance any spikes in activity that would provide a less than adequate shopping experience for customers. The primary data center is located in Los Angeles, California and the hot site data center is located in Atlanta, Georgia. Although the cost associated with this configuration is high, it is the best way to provide the continuity of the business in the event of a catastrophic event such as a fire, flood or an earthquake.
Diagram of the organization’s network architecture and the proposed network architecture of an alternate computing facility in the event of a disaster. Develop the DRP Policy Disaster Declaration Assessment of Security Security and control within an organization is a continuing concern. It is preferable, from an economic and business strategy perspective, to concentrate on activities that have the effect of reducing the possibility of disaster occurrence, rather than concentrating primarily on minimizing impact of an actual disaster. This phase addresses measures to reduce the probability of occurrence. Security assessment of the computing and communications environment including personnel practices; physical security; operating procedures; backup and contingency planning; systems development and maintenance; database security; data and voice communications security; systems and access control software security; insurance; security planning and administration; and application controls. An accurate security assessment will enable the project team to improve any existing emergency plans and disaster prevention measures and to implement required emergency plans and disaster prevention measures where none exist. After the assessment is done presentation of the findings and recommendations resulting from the activities of the security assessment to the management so that corrective actions can be initiated in a timely manner. Potential Disaster Scenario and Methods of Dealing with the Disaster The risk of running an e-commerce business is not much different than running a traditional business. You have personnel, building assets and technology assets so the potential disasters are similar. For example, in the event of a fire due to sabotage, electrical malfunction, or even the risk of wildfires, there are several measure that possible in dealing with a fire. Preventative and detective measures such as fire and smoke alarms are available to the facility, halon system to extinguish the fire if it breaks out in the data center computing areas, training users through the use of fire drills, proper documentation of diagrams of the building and all appropriate fire paths to exists are all ways to deal with a fire based disaster. Disaster Recovery Procedures Should the fire be detected in the data center computing area (server room), the alarm will sound and all personnel should immediately exit the building, especially anyone in the server room because halon exposure is deadly, along the routes determined by supervisors. The halon extinguishing system will seal the server room and activate, eliminating the fire within seconds. If fire is detected in any other zone, the sprinkler system will active and send a deluge of water throughout the affected area to extinguish the fire. Emergency response and fire personnel have an expected arrival time on site of 15 minutes after the alarm and organizational personnel are to reenter the building until the all clear is given by the fire department. Once the all clear has been given and if there was an actual fire, the IRT should get to work on dealing with the incident. Develop an Incident Response Team (IRT) charter Executive Summary The elements of a traditional agency computer security effort continue to be important and useful. There are several reasons that necessitate the establishment of an incident response plan and a few of them are that computers are widespread throughout the agency and the business relies heavily on computers and cannot afford them to be down for any significant amount of time. The organization’s computer systems and networks are at a high risk to threats such as computer viruses, intrusions, and denial of service attacks as well as the same risks that traditional businesses face such as fires, earthquakes and floods. These events can cause the company to face unnecessary expense in productivity, significant damage to systems, and damage to our reputation. The inability to address these risks is not an option, so the need now exists to take action prior to suffering the consequences of a serious computer security problem. Mission Statement Improve the security of the TechWidgets Inc., information infrastructure, minimize the threat of damage resulting from an incident and promote the prevention of such incidents in the future. Incident Declaration Declaration of the disaster and implementation of the emergency response can only occur when senior management requires it so. However, the following events have been prequalified as disasters: Damage, destruction, or an outage of the two following resources – Web Front end cluster, Web store Cluster, Data Cluster, Core Router 1,Core Router 3, Firewall 1, Firewall 2, ISP Link 1, and ISP Link2. Any other events, including but not limited to fires, worms, viruses, and other malware are subject to classification by management and to be handled appropriately. Organizational Structure Incident Response Management Incident Response Coordinator Technical Support Team Technical Assessment Team Communications Team Incident Response Support Roles and Responsibilities and Information Flow and Methods of Communication. * The technical assessment team is responsible for monitoring all sources of alerts, logs, and other warnings in the environment. In the event of an incident, they are responsible for determining if a response is necessary and notifying the coordinator. * The incident response coordinator tracks all reported potential threats, notifies management in the event of potential threats with the appropriate recommendation of action, alerts the communication team of threats and potential disasters. * The communications team is responsible for informing the employees of the organization of the activities of the IR team, and informing the other members of the IR team of the decisions made by IR management. * IR Management’s role is to determine when the risk has been mitigated to an reasonable level, give updates to upper management when needed, estimate the level of damage or impact of the incident, and document lessons learned for the disaster recovery process. * The technical support team is the expertise that is needed to carry out the functions required to get the business back up and running. This includes technical ability from I.T., facilities, infrastructure, HVAC, etc. Methods and Services Provided by the IRT. To adequately respond to an incident, predetermined teams will participate depending on the incident characteristics. As the situation develops and the impact becomes more significant, the various teams will be called to participate. Regardless of the type of incident, a six step method is generally followed to respond to an incident. 1. Preparation, one of the most important facilities to a response plan is to know how to use it once it is in place. The company has documentation for incidents of several types and levels of impact. Please see the risk assessment plan for details. 2. Identification, identify whether or not an incident has occurred. Proper analysis of log files and alerts will provide the required information for this step. 3. Containment involves limiting the scope and magnitude of an incident. 4. Eradication involves removing the cause of the incident. This can be done by restoring services and configurations to the last known good state or by completely rebuilding it based on the configuration information that was previously documented. 5. Recovery is restoring a system to its normal business status. This assumes that the eradication process was successful and that the services/configurations have been validated and verified as good. 6. Follow-up, this is where lessons learned are documented and modifications to processes and procedures are evaluated. This can be a very difficult process if the incident that is being recovered from had a particularly large impact.
University of Arkansas: Computing ServicesDisaster Recovery Plan. (1998). http://www.uark.edu/staff/drp/ Computer Security Incident Response. (n.d.). http://www.csirt.org/ Federal Emergency Management Agency. (1993). http://www.fema.gov/library/viewRecord.do?id=1689 Global Information Assurance Certification. (2002). http://www.giac.org/paper/gsec/3147/computer-incident-response-team-charter/105241