Der Untergang der Titanic. Willy Stöwer (1912)

Harbinger of Things to Come?

Since 2002, I have worked for the IT department of a government agency. In 2006 or 2007, a software upgrade of the disk controllers on the principal systems went wrong. For a week, they were out of operation. It probably was the biggest crisis in the agency’s history. I was a database administrator working on a systems renewal project at another location. Others dealt with the issue. I knew the systems were offline but didn’t know how serious it was. After a week, the telephone rang at home. It was 9 PM.

My wife took up the phone. It was the IT director. He said there was a situation and asked me to come to the office. His voice reflected fear. ‘As if the Titanic had hit the iceberg,’ my wife later noted. I went to the office in a hurry. I looked at the logfiles, found the error messages, typed them into the Google search bar, and found a document on the Internet with the remedy. I repaired the failures and brought the systems online. Board members and senior managers were standing around me, watching me typing. In 2007, few people knew you could use Google for this.

Then, I learned that we had not made a backup for a week and that the mirror copy was also offline. You probably know why backups are needed, but you may not know what a mirror copy is. If you own a computer or a mobile phone, your data is on a device like a disk. If the disk fails, your data may be gone forever. If you lose some photographs of your late cat, you might feel sad about it, but after a few years, you get over it, perhaps after consulting your psychiatrist and taking a lot of pills.

Corporations can’t lose their data. That would finish them. Their business is their data. Without their data, they are out of business. If you have a backup, only the data from after the latest backup may be lost, but that may still finish you, most notably if you have not made a backup for a week. We were a government agency, so it would not have bankrupted us. But it would have caused a national political scandal.

That is why corporate computers come with multiple groups of disks on different sites. If one group catches fire or stops operating because of a failed software upgrade, the data is still available on the other groups. These groups are called mirror copies. We had two groups, one of them being the original and the other being the mirror. You can imagine my bewilderment. We had no backup, and the mirror copy wasn’t available. So much had gone wrong, so it was a miracle that I could recover the data. We were on the brink.

An even greater surprise was yet to come. The managers and the board wanted to return to business as usual and run the backlog of batch jobs. Then I said, ‘This is perhaps the most important advice I will ever give in my entire career. Don’t start the batch jobs yet. We are on the proverbial edge of the precipice. Running the jobs might just push us over. Everything went wrong for a week and there is no guarantee whatsoever that it will be all right now. We should bring the mirror copy back online and make a backup first.’

They planned to ignore my advice. Bringing the mirror copy back online and taking a backup would take eighteen hours of precious time. It was a lot of data to back up, as it was everything we had. I was a low-ranking official while the IT director had claimed there was nothing to worry about. I kept stressing that making a backup was the right thing to do. ‘If something goes wrong we could be finished,’ I told them. It was the worst crisis ever. And so, I pressed for an extensive check-up to see if everything was in order. On that, they could agree.

During the check-up, I found another failure everyone had overlooked. That scared the managers and the board. They backed off and subsequently followed my advice. The operators brought the mirror copy online and made a backup before we resumed normal operations. In this way, rational decision-making prevailed. Nothing went wrong anymore, but no one could have known that beforehand. And I wasn’t a psychic either.

The audit department later evaluated the crisis. The auditors noted that after a week of failures, all problems suddenly vanished. They found that hard to believe already. What they found even more difficult to fathom, and they stressed the inconceivability of it during a meeting, was that after a week of irrational decision-making, sanity suddenly took hold as we had brought the mirror copy back online and made a backup. They couldn’t figure out why that happened. Our management had kept them in the dark about that.

If it had gone wrong, the agency probably would have survived. Most operations would likely have had to stop for several weeks—that had already happened for a week—and it may not have been possible to recover all the data. That would have made the headlines and the news. It never came to that. When a local newspaper smelled a rat, the board could tell the newspaper the situation was under control and that the data was safe.

It didn’t help my career. Somewhat later, the other senior database administrators received a higher salary grade. Years later, I switched to Java programming when there were too many database administrators. My managers told me I did well and rewarded my efforts by proposing a lower salary grade. That was unusual and possibly unprecedented, so I told them to go to hell.

The incident made me worry about my impression. It suggested my managers didn’t believe I was a useful employee, which angered me. I wrote a letter of complaint detailing what happened during the crisis and requested that it be added to my personnel file. I worked for a government agency and didn’t have to fear dismissal. Perhaps promotion was out of the question, but I didn’t hope for a higher rank.

Your career depends on having the right friends in high places. Managers had diverging views about me. Those who focused on results held me in higher regard than those who occupied themselves with people, teamwork and social events. During a crisis, when rational thinking is of the essence, our social nature can be fatal. There is too much gossip and small talk, and we ignore the elephant in the room. Anyone who does that will not win a popularity prize. It is also why politics in democracies can fail in critical situations.

Featured image: Der Untergang der Titanic. Willy Stöwer (1912). Wikimedia Commons. Public Domain.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.