Triware Networld Systems 

19 Years Of Around The Clock Superior Network Systems Service & Support!

 

Home
Solution
Technology
Service
Support
Client
Partner
Career
Events
News
   Back ] Up ] Next ]
 
   

 

It Is Not About What You Restore But What You Recover!

The most common thinking when it comes to Backup vs. Recovery is that the backup process is what is most important. In reality, having the ability to recover fully is what is paramount. Unfortunately, IT professionals rarely fully test their systems’ ability to recover. Many spend most of their effort on the guidelines and implementation of the backup process and on determining how long it takes, were there errors reported in the process, etc. Then, when the time comes that a recovery is needed, they find out that their system has errors or in other ways does not fully recover the state of their business.

First let’s define the difference between restore and recovery. To restore is to bring back the data from a replicated copy. To recover is to go back to a functional point before the disaster or failure occurred. These are two very distinct points. Restore does not equal recovery; you may get back every bit of data, but it may be in a state that is not usable, since the interrelationship between the information was lost. This is why, for example, nearly all databases and mail systems require a recovery process in addition to the restore process. Even the time required for backup, restore and recovery activities may be very different for each. A session that took one hour to store to tape may take four hours to restore and several days to recover.

Under most backup methods, the time that it takes to backup streaming data to the new media (disk, tape, etc.) is all that is counted in terms of time required, and it assumed that the recovery time will be similar. However, what is involved goes beyond simply streaming data back from the media. For example, if the media is stored offsite, there will be time required for the media to come out of storage and back. There may be time needed to access the index or catalog to determine the location of correct the session. In some cases, the catalog or index itself may need to be restored before attempting to retrieve the data, and this may impact the file system, since a write-to disk is required.

Generally speaking, even just a straight block image restore takes more time than it did to backup, and if done at a file level, it’s even slower. Recovery takes longer because a data organization tool may need to be run either at the file system level or through the application tools themselves. Finally, there may be interrelationships in the data at the application level that need to be reconstructed. 

Most IT departments do not practice recovery due to the cost involved, although it’s well known that the best way to validate a Disaster Recovery process is to test it to full recovery. At minimum, it means having the equipment or space to which to restore the data, and this may mean that a duplicate environment will need to be built. And, personnel time to complete the process also must be allocated. In general, backups are easier to control and limited restoration is fairly easy. Full recovery testing on the other hand is much more difficult, and there are always some unknown factors. But it is these very unknown factors that make it worthwhile. 

For example, finding out ahead of time, when the situation is not critical, that it takes three days to get a tape out of storage gives you the opportunity to plan on how to mitigate the impact of this situation when it is critical. Performing a full recovery during a critical situation can put the business in jeopardy. Better to determine the parameters and limitations in a test environment rather than during a genuine crisis.

So, stop concentrating on “Backup Window” and start concentrating on “Recovery Window.” Inevitably the “Recovery Window” discussions will lead to items that affect the Backup Window. Look at the uses of the data in order to justify costs. In general, data has several cost factors, the most important being the cost of the data’s unavailability. Most IT departments look at the benefit of availability: what it’s producing by being online. All individual data is usually a subset of the overall profitability, but negative impacts can be much greater. Just having one critical dataset offline can affect all of a company’s activities. The time to judge the data’s value – and what impact its unavailability will have – is not during an emergency. Unfortunately, from our experience, that is when most companies do it.  And when this is done during an emergency, they may not fully assess what the data being offline ultimately costs the company.

Build simple Disaster Recovery Systems. Wherever possible, invest in products that help you simplify the process, and test them thoroughly. Look at forming partnerships with others who have the required expertise and facilities. Wouldn’t it be great to at least once a year go through a total disaster recovery of a whole system? At a minimum, on a quarterly basis, you should conduct a practice recovery on a least some of your most important systems that can be offline for a few days, and no less than monthly, you should practice recovery to the ones that are mission-critical. Better to discover errors in the process when there is little impact than when there is no safety net at all.

It is IT’s responsibility to inform management of potential exposure to the business and communicate clearly. The best proof of planning is to go through a disaster and have it be a non-event. For example, in 1999 everyone was concerned about the impact of Y2K and clearly saw it as an IT priority. Therefore, resources were allocated far ahead of the event, and nearly everyone tested the results. The result of this planning and forethought was that when the new millennium arrived, it was a non-event. In contrast, look at the Daylight Savings time change in 2007 and the impact it had on unprepared IT departments.  Many IT departments it was days, in some cases weeks before all the coordinated patches required were put in place and calendars were synchronized properly across devices and users. Some vendors were releasing patches up to days before the change.

If you do not know whether you can fully recover until you actually have a disaster, that in itself becomes a potential additional disaster, but one that can be avoided, albeit with a bit of effort. Practice recovery is not just backup. Backup is really step one of a future recovery. Take the other steps before circumstances force them upon you. This way, can approach nearly inevitable disasters, both large and small, without additional self-induced stress. The worse thing that can happen is discovering that a disaster is an unrecoverable event. Practice recovery beforehand, and you eliminate this possibility.

By Jerry Ware, IT Solutions Fellow

Jerome Ware Biography

Jerome Ware, CNE, EMC Certified, MCP has over 20 years in the high tech and financial industries.  Some of the organizations he had served are Computer Associates, Desktop Products, EMC Corporation, Montgomery Securities, Robert Quinn and Associates.  Mr. Ware has a B.A. degree in Fine Art from San Jose State University.

Back to Top

Information Request Form

Sign up for TNS News Letter

Information Request Form

Select the items that apply, and then let us know how to contact you.

Request a Senior Partner contact me
Request a Web Meeting and / or Web Demo
Subject
Name
Title
Company
Address
E-mail
Phone

Business Partners

   
     

© Copyrights Triware Networld Systems, L.L.C. ® 1991-2010