|
It
Is Not About What You Restore But What You Recover!
The most common
thinking when it comes to Backup vs. Recovery is that the backup
process is what is most important. In reality, having the ability to
recover fully is what is paramount. Unfortunately, IT professionals
rarely fully test their systems’ ability to recover. Many spend most
of their effort on the guidelines and implementation of the backup
process and on determining how long it takes, were there errors
reported in the process, etc. Then, when the time comes that a
recovery is needed, they find out that their system has errors or in
other ways does not fully recover the state of their business.
First let’s define the
difference between restore and recovery. To restore is to bring back
the data from a replicated copy. To recover is to go back to a
functional point before the disaster or failure occurred. These are
two very distinct points. Restore does not equal recovery; you may
get back every bit of data, but it may be in a state that is not
usable, since the interrelationship between the information was
lost. This is why, for example, nearly all databases and mail
systems require a recovery process in addition to the restore
process. Even the time required for backup, restore and recovery
activities may be very different for each. A session that took one
hour to store to tape may take four hours to restore and several
days to recover.
Under most backup
methods, the time that it takes to backup streaming data to the new
media (disk, tape, etc.) is all that is counted in terms of time
required, and it assumed that the recovery time will be similar.
However, what is involved goes beyond simply streaming data back
from the media. For example, if the media is stored offsite, there
will be time required for the media to come out of storage and back.
There may be time needed to access the index or catalog to determine
the location of correct the session. In some cases, the catalog or
index itself may need to be restored before attempting to retrieve
the data, and this may impact the file system, since a write-to disk
is required.
Generally speaking,
even just a straight block image restore takes more time than it did
to backup, and if done at a file level, it’s even slower. Recovery
takes longer because a data organization tool may need to be run
either at the file system level or through the application tools
themselves. Finally, there may be interrelationships in the data at
the application level that need to be reconstructed.
Most IT departments do
not practice recovery due to the cost involved, although it’s well
known that the best way to validate a Disaster Recovery process is
to test it to full recovery. At minimum, it means having the
equipment or space to which to restore the data, and this may mean
that a duplicate environment will need to be built. And, personnel
time to complete the process also must be allocated. In general,
backups are easier to control and limited restoration is fairly
easy. Full recovery testing on the other hand is much more
difficult, and there are always some unknown factors. But it is
these very unknown factors that make it worthwhile.
For example, finding
out ahead of time, when the situation is not critical, that it takes
three days to get a tape out of storage gives you the opportunity to
plan on how to mitigate the impact of this situation when it is
critical. Performing a full recovery during a critical situation can
put the business in jeopardy. Better to determine the parameters and
limitations in a test environment rather than during a genuine
crisis.
So, stop concentrating
on “Backup Window” and start concentrating on “Recovery Window.”
Inevitably the “Recovery Window” discussions will lead to items that
affect the Backup Window. Look at the uses of the data in order to
justify costs. In general, data has several cost factors, the most
important being the cost of the data’s unavailability. Most IT
departments look at the benefit of availability: what it’s producing
by being online. All individual data is usually a subset of the
overall profitability, but negative impacts can be much greater.
Just having one critical dataset offline can affect all of a
company’s activities. The time to judge the data’s value – and what
impact its unavailability will have – is not during an emergency.
Unfortunately, from our experience, that is when most companies do
it. And when this is done during an emergency, they may not fully
assess what the data being offline ultimately costs the company.
Build simple Disaster
Recovery Systems. Wherever possible, invest in products that help
you simplify the process, and test them thoroughly. Look at forming
partnerships with others who have the required expertise and
facilities. Wouldn’t it be great to at least once a year go through
a total disaster recovery of a whole system? At a minimum, on a
quarterly basis, you should conduct a practice recovery on a least
some of your most important systems that can be offline for a few
days, and no less than monthly, you should practice recovery to the
ones that are mission-critical. Better to discover errors in the
process when there is little impact than when there is no safety net
at all.
It is IT’s responsibility to inform management of potential exposure
to the business and communicate clearly. The best proof of planning
is to go through a disaster and have it be a non-event. For example,
in 1999 everyone was concerned about the impact of Y2K and clearly
saw it as an IT priority. Therefore, resources were allocated far
ahead of the event, and nearly everyone tested the results. The
result of this planning and forethought was that when the new
millennium arrived, it was a non-event. In contrast, look at the
Daylight Savings time change in 2007 and the impact it had on
unprepared IT departments.
Many
IT departments it was days, in some cases weeks before all the
coordinated patches required were put in place and calendars were
synchronized properly across devices and users. Some vendors were
releasing patches up to days before the change.
If you do not know whether you can fully recover until you actually
have a disaster, that in itself becomes a potential additional
disaster, but one that can be avoided, albeit with a bit of effort.
Practice recovery is not just backup. Backup is really step one of a
future recovery. Take the other steps before circumstances force
them upon you. This way, can approach nearly inevitable disasters,
both large and small, without additional self-induced stress. The
worse thing that can happen is discovering that a disaster is an
unrecoverable event. Practice recovery beforehand, and you eliminate
this possibility.
By
Jerry Ware,
IT
Solutions Fellow
Back to Top 
Information Request Form
|
 |