Here’s a scary thought that keeps me awake at night: even after decades of preaching and teaching the importance of a sound backup procedure with regular testing, many organizations are still cutting corners when it comes to their backups and come to me for help with iSeries disaster recovery.
Don’t get me wrong, I love to help! And there are many examples of organizations who, despite following best practices, have been unfortunate victims of natural disaster.
Still, when you’ve seen as many close calls and unhappy endings as I have, you’ll understand why I always say that the three rules for iSeries disaster recovery are:
If you have no disaster recovery plan, have never tested a recovery, or are thinking it could never happen to your organization, you could be in for some scary consequences. Here are just a few true stories from customers I’ve encountered over the years. Let them serve as warning—there’s something to learn from each situation.
Moving Tapes Out of an Evacuation Zone
Hurricane season is serious business. Hurricanes Harvey, Irma, and Maria are just a few recent examples of the devastating impact these storms can have on communities and the businesses that operate there.
When you’re fortunate enough to receive warning that a hurricane may make landfall in your region, you can prepare by running a full system backup with Save Option 21, Robot Save, or BRMS prior to a potential disaster situation. But that’s only part of the story.
I have encountered many customers who dutifully perform these full system backups. But when a hurricane is churning and you need your offsite storage vendor to pick up the tapes, things change. Many vendors are not allowed to travel into an evacuation zone—and for good reason.
In one particular hurricane, I only saw tapes arrive from those customers that had their own private Learjet to transport the tapes to the disaster recovery center. Try telling that one to your boss: “We need a private jet to transport our most current backup tapes, just in case!”
For organizations that operate in a hurricane zone, I recommend a virtual tape library (VTL) and real-time data replication to a remote backup location with Robot HA or PowerHA. Learn more about this powerful combination in our Recovery Without Disaster guide.
Recovering after a Flood
I once received a call asking, “Can you recover from tapes that are muddy and wet?”
This call was from a small customer whose business had been flooded. They had what I call the “two-tape” strategy:
- The first tape went into the tape drive at night for the backup.
- The second tape was stored on top of the system. (There was no offsite tape storage.)
I asked the customer to send in their tapes so I could have a look. The tapes arrived in a Ziploc baggie—mud and water included. We used a hair dryer to dry them off!
Believe it or not, this customer was very lucky. We were able to read the data from one of the two tapes.
If you are relying on tape backups, it is critical that the tapes are stored offsite in a safe and secured location. A better solution is to do both tape backup and real-time data replication. Again, our Recovery Without Disaster guide can help you see why.
Putting Save Option 21 to the Test
IBM i users are always told to run Save Option 21 to ensure a full system backup with simple system recovery. But does Option 21 always ensure a complete backup? Not necessarily.
I was working with an organization to test their recovery using the Save Option 21. The recovery ended after restoring the libraries, but none of the document library objects (DLOs) or integrated file system (IFS) objects were restored—the data was completely missing from the tape!
This organization had been dutifully testing using their Option 21 for years and never encountered such a problem. What could have happened?
After much analysis, we discovered that a message reply list entry had been set up to automatically reply to any and all messages received during the backup with “Cancel”. While the libraries were being saved, a message had gone out identifying a damaged object. Subsequently, all saves after the libraries were canceled.
IBM i stores a lot of data in the IFS, as do many other applications. So, without a good save of the IFS, you don’t have a usable system. It’s wise to review the backup logs after every save to ensure that everything has been saved. And, of course, be sure to test your recovery on a regular basis, like the team in this story, so that you catch issues when the stakes are low.
Identifying Bad Backup Procedures
For years, a customer regularly performed their disaster recovery test by doing a system recovery from tape at their disaster recovery center at a remote location. Every year, all the data fit on one tape and they always had a successful test.
One year, the customer arrived with two tapes in hand. After restoring the data from the first tape, the recovery failed on the second tape. When we displayed the tape on another active system, we discovered there was no data on the tape!
“What happens during the backup after you mount the second tape?” I asked.
“We get a message on the console to mount the next tape and we reply with a ‘C’ to continue the backup,” they replied.
The backup mystery had just been solved. Replying with a “C” actually cancels the backup. The correct reply would be a “G” to Go.
Fortunately, this organization was only testing their recovery, so they were able to correct their backup processes. When it comes to evaluating your backup/recovery procedure, a Backup and Recovery Assessment from a professional can really come in handy.
Cutting Corners on the Backup Budget
Despite all our planning, procedures, and best practices, sometimes those of us responsible for IBM i backups and the safety of the data can only stand by and watch as decisions beyond our control are made at a higher level.
I once worked with an organization that operated out of an area that was prone to tornadoes. The IT management team made the decision that the tapes would only be sent offsite once a week for financial reasons. The daily backup tapes would have to remain onsite in a vault to save money on sending tapes offsite every day.
Doing their best to stay optimistic in the face of this poor data backup decision, the plucky IT staff suggested they install a GPS inside of the vault so that they would be able to find their backup tapes should a tornado fling them up into a tree in some unknown location.
Disaster recovery definitely has its challenges. Remember to test your recovery on a regular basis. And it just might help to keep a sense of humor should you find yourself starring in a scary story of your own.