Understanding your DR from your BC
This is a post I have covered before when I worked at PeaSoup Hosting , as this is very relevant for the purpose of this blog I decided to reuse it again here.
Over the last few months, I have had many discussions on the importance of Disaster Recovery (DR) planning and considerations for the most suitable location. It became apparent that many people are confused between DR and business continuity (BC) and understanding the differences.
Disaster recovery is to recover from a business disaster which will involve downtime of an organization’s IT systems, where the aim of any DR plan to keep the down time as minimum as paramount. Business continuity on the other hand is a solution that does not involve any down time even when there is a disaster. Often this IT architecture is used in large enterprises with 24/7 requirements which can’t afford any downtime and are both complex and costly in nature for most organisations.
Gartner analysts Witty and Scott stated back in 2001 two out of five businesses that experience a disaster go out of business within five years, but DR and BC planning ensures continued viability. In today’s fast paced economy this appears to be more prevalent. What impact does a disaster have on your company? You not only lose revenue, you lose credibility which in the long term can cause more harm for your company. Would you do business with someone who can’t protect their own business?
How can you counteract the fact you have an outage where your staff can not access their datacentre services and receive any orders via web / email / phone etc. A plan to get your essential services up and running as fast as possible or it could result in you losing your business.
Businesses needs to move away from a heavy reliance of traditional backup in order to provide a well-structured business DR plan. Restoration from backup for DR is too time consuming and not in line with today’s data generation. But before I go into today’s DR planning & times, it’s best to explain the impact in terms of potential costs for an outage.
A few years ago, whilst working for a company specialised in DR, I found a formula to calculate the cost of an outage. The formula incorporates traditional backup methods via tape, which is the most widely used method for DR planning.
DR calculation formula:
Costs per incident = (Duration of incident in hours + Working hours x Days since last backup) x ((Hourly tariff employees x Number of employees) + Lost revenue per hour)
For example, assume the following, an incident takes 24 hours and your last full backup is 1 day old. In total there are 75 employees and the average wage is £13 per hour. The revenue per hour is £1000. This would give you the following sum;
Cost per incident = (24 hours + 8 hours x 1 day) x ((£13 ×75 employees) + £1,000)) = £63,200
£63,200 for a day’s outage is a substantial loss and this is just the initial loss (add the loss of new customers and existing customers, fast replacement of any new hardware, software etc for the IT department – you can only imagine the impact this will have on any business!).
With this in mind, are you confident your IT department has the necessary plans and procedures in place for a recovery agreeable with all departments across the business? Do you know the key factors you need to address for your DR plan?
You need to think about the following key factors:
– Your service level agreement (SLA) with your customers / management / departments
– Recovery Time Objective (RTO)
– Recovery Point Objective (RPO)
By addressing these key factors, you will be able to create a robust DR design for your company that is suitable for your business. I shall discuss each one in more detail for you.
SLAs – They are important to define from the outset as the needs of each department in your business may differ. Some departments may have a lower SLA than others, dependent on how reliant they are on IT, for example a Sales department vs a manufacturing department . RTO and RPO are the main discussion points during any DR conversation and they define your SLA and DR architectural design suitable for your business. Let’s look at them in more detail.
The “Recovery Time Objective” or “RTO”, is the duration of time and a service level within which a business process must be restored after a disaster (or disruption) to avoid unacceptable consequences associated with a break in business continuity.
I.e. this means the total time to recover your business processes completely, so that the business can continue working as usual. An acceptable RTO is determined by senior management as the more downtime occurring will correlate to a loss of day-to-day revenue generation.
A “recovery point objective” or “RPO”, is defined by business continuity planning. It is the maximum tolerable period in which data might be lost from an IT Service due to a Major Incident.
In simpler words, it means the amount of data in hours you will lose as a business, here’s a scenario: you have a backup regime every night starting at 7pm and ending at 10pm. The next day at 3pm an incident occurs, which means you lost all data created between 10 pm and 3pm the next day. In this case you would lose 17 hours’ worth of data and potential revenue.
Every business should create a DR plan for their company, with defined RTO, RPO and SLAs. Ask yourself questions – how much down time can your business afford? How much data loss can you suffer as a business? How much revenue am I willing to lose?
So we have explained the difference between DR and BC and the key factors required for DR planning, the next step it to divide applications into SLA tiers in a hierarchy, in terms of importance. Most companies have different business process / applications that they use for their business, and I always advice companies to create a list of applications and processes and sit down with the department managers and determine the SLA around individual applications.
Also directly involve the Finance director in these conversations, as a budget needs to be set for your DR plan & solution and by creating steering group you be able to generate healthy discussions and obtain ‘buy in’ necessary for creating a realistic budget.
During your SLA assessment, create a simple table so you can get a better understanding of the applications of the business and be able to identify and rank them in terms of importance for each department and to create the design architecture for a DR solution for your company.
Here is an example of an SLA Tiered table with agreed RTO/RPO tiered times
There are no references to any physical or virtual servers in the example as each process/application could use a variety of servers and devices.
Once you have determined your SLAs, RTO and RPO you can address your infrastructure requirements. Start by reviewing your application list and identify how many sit on physical hardware, how many are virtualised and what virtualisation technology is in use etc.
Lastly, where do you want your DR solution to be located? Here are some key questions you need to think about:
– Can we provide DR in-house at another location?
– Can we use the Cloud?
– Where are the Cloud data centres locations?
– What security do we require from a Cloud Service Provider?
– What technology will we require?
– What is our budget?
In this personal blog I will explain scenarios where Arcserve Products can help customers to have a simplified DR and HA solution.