First thing I did yesterday morning when I woke up is to call my parents in Kolkata and then J to let them know about the Crowdstrike outage so they would not be surprised or stranded just because they hadn't read the news yet. One of the bad habits I have been trying to break for years is not read the headline news before I am out of bed in the morning. It is not a great way to start the day given how such reading makes a person feel. Yesterday morning was no different, I was following my ritual and then the news was such that I had to make the calls.
Just a small mistake, such enormous consequences was the first thought that crossed my mind. If a business is big enough to have a large number of remotely located computers affected by this outage but lacks the financial resources to dispatch technicians to go physically fix the problem, they could bleed to death just from this singular event. There are people who are would die without seeing their loved one for the last time because there was no way for that person to fly. Patients could have died from hospitals not being able to take them in on time. The list of possible tragedies that have come about from this event is only limited by our imagination.
It could take millions of person-hours of work by corporate IT professionals to fix all the computers that were affected, said O’Neill, the former FBI counterintelligence operative. But, he said, coming up with a firm estimate is difficult because it’s unknown how many computers were affected.
That is the Y2K of our times and its unclear where those resources are going to materialize from. More likely than not, random people closest impacted computer would need to follow the directions of a remote sysadmin, serve as the button-pusher, do what is needed to get the computer back up and running. I am trying to imagine a scene where one of my parents is getting instructions by phone on how to boot up a computer in safe-mode and it feels surreal.
In organizations where that is not allowed for reasons of policy and bureaucracy, it may be a long while before things are back up and running.
The lack of quality control and testing done before a global release is quite astounding. If the error is as basic as reported then it appears as if a high-school intern was left in charge of the code release with no one watching - this is assuming the intern even did their own work and did not get an LLM to write code while they chilled.
The problem with CrowdStrike’s update was that it wasn’t formatted correctly “and causes Windows to crash every time,”
Comments