Cloud Disaster Recovery
A cloud disaster recovery plan (DRP) aka business continuity plan (BCP) aka business process contingency plan (BPCP) is a plan that details how an organization will deal with potential disasters. A disaster is an event that makes the continuation of normal functioning of the organisation impossible, so the DRP consists of the measures to ensure the effects of a disaster are minimized and the organization can maintain or quickly resume essential (mission-critical) functions. Typically, DRP requires analysis of business processes and continuity needs and may include focus on disaster prevention.
For many businesses, the entire business is in one place; consequently any catastrophe with the potential to destroy a company’s physical property has the potential to destroy its business-critical documents and data. Even if the company data isn’t destroyed, restoring it can be a lengthy, expensive process meaning the catastrophe has the potential to destroy the business and result in it ceasing trading.
Many businesses assume that a DR plan is only based on the businesses I.T. structure, this is not true and other physical business processes must be taken into consideration. For example, a company has physical files held within their office, what happens if there is a fire and the files are destroyed? Does the company have another physical copy in an additional location? Similarly, in a factory, what happens if a vital machine breaks, how would the company continue to manufacture the product? Any process that could hinder the business must also be taken into consideration within the DRP.
A DRP begins by identifying and prioritizing applications, services and data, and for each determine the downtime that’s acceptable before a significant business impact. In turn these confer the priority of the application or service along with recovery time objectives (RTOs) and taken as a whole, these determe the disaster recovery tactics that make up the DRP as a whole. Once applications are identified and prioritized and RTOs defined, the best and most cost-effective methods of achieving the RTOs are determined by application and service.
A central part of DRP is backup of data so we need to examine that as part of the planning process. The “traditional” approach had been to do initial backups to disk-based storage elsewhere in the building and further backups to tape. Tapes are stored in a tape library and taken to off-site storage, either daily or weekly. This policy of off-site storage means the backup data is at another location, limiting exposure to data loss; the negatives being that it assumes the tapes are good, not lost in transit (not always the case) and recovery can take weeks. This approach was superseded by real-time or near-real-time replication; however off-site data replication is limited by available bandwidth and possibly distance (number of Internet hops) between sites. A central issue here is that the data replication site must be far enough from the main site to avoid being affected by any disaster that overtakes the main site. Clearly any bandwidth issues may be overcome simply by buying more but this is a good illustration of the fact that any DRP must be affordable!
Cloud based backup and disaster recovery is attractive for a number of reasons, the first being that the “cloud” is physically divorced from your current location; i.e. it’s not where you are. It is attractive to SMEs as the cloud is flexible, elastic (grows/ shrinks on demand), low cost and has a per use cost model. Disk storage and bandwidth have fallen dramatically in cost over recent years allowing disaster recovery solutions previously only available in the enterprise space to be deliverable to the SME market.
The following are the approaches that are available:
Managed applications and managed DR.
Here both primary production and disaster recovery are in the cloud and both handled by a managed service provider (MSP). This confers all the benefits of cloud computing, including eliminating on-premises infrastructure and a usage-based cost model. A significant element is choice of service provider and negotiation of service-level agreements (SLAs). The critical piece here is the capacity within the defined SLAs to deliver uninterrupted service. A cloud only solution is becoming popular for email and some business applications, such as customer relationship management (CRM) with Salesforce.com and Workforce.com being good examples.
Backup to and restore from the cloud.
Here applications and data are on-premises, with data backed up into the cloud and restored to on-premises hardware when a disaster occurs; so cloud backup is a substitute for tape-based off-site backups making it the least disruptive to adopt for most businesses.
When considering cloud backup and recovery, the crucial aspect lies in having a clear understanding of both the backup and the potential complications posed by restore. Backup into the cloud and keeping on-premises data and data in the cloud in sync is straightforward. However, the challenging feature of cloud-based backups for DR is recovery. Bandwidth is definition limited and with potentially terabytes of data to be recovered, recovering data within defined RTOs is the challenge. Optionally data may be restored to disks, which are sent to the customer for recovery, or an on-premise cache of recent backups may be used for local restore. Another option is features such as compression and data de-duplication can make restores from data in the cloud to on-premises infrastructure viable. However this is contingent on the nature of the data.
Backup to and restore to the cloud.
Here data is restored to virtual machines in the cloud rather than to on-premises infrastructure. Requiring both cloud storage and cloud computer resources, the restore is done when a disaster is declared or on a continuous (pre-staged) basis. Pre-staged clients are computer account objects (virtual machines VMs here) that are created within Active Directory Domain Services (AD DS) before the operating system is installed and correspond to physical devices that boot from the network by using Windows Deployment Services. Pre-staging relatively up-to-date DR VMs through scheduled restores is crucial where aggressive RTOs are required. Bringing up cloud VMs is a service some cloud service providers facilitate as part of their DR offering.
Replication using virtual machines in the cloud.
Applications needing short recovery time and hence tight recovery point objectives (RPOs), the data movement option of choice is replication. Under this regime replication to cloud virtual machines is used to protect both cloud and on-premises production. In this way replication is suitable for both on-premises-to-cloud-VM data protection and cloud-VM-to-cloud-VM. Such replication products are based on continuous data protection (CDP).
Updated options, same principles.
Yes the cloud confers many more options but DR fundamentals are the same as ever; you need a solid DRP regularly tested with users trained and prepared for any eventuality.