subscribe Support our award-winning journalism. The Premium package (digital only) is R30 for the first month and thereafter you pay R129 p/m now ad-free for all subscribers.
Subscribe now
Picture: MAILEE OSTEN-TAN
Picture: MAILEE OSTEN-TAN

Friday’s global IT outage, triggered by a faulty CrowdStrike update, sent shock waves through the tech world. As the dust settles, we in the cybersecurity industry are taking stock of the incident’s far-reaching implications.

July 19 was one of the busiest days I’ve had in the last 25 years. My first thought was that there were targeted attacks against SA businesses, but ultimately we found it was the global outage caused by the CrowdStrike update.

This incident, described as the largest IT outage in history, affected over 8.5-million Microsoft devices worldwide. Its impact was felt across multiple sectors, grounding flights, disrupting banking and healthcare services, and causing widespread business interruptions. Early estimates suggest the costs could run into billions of dollars.

As we grappled with the fallout, my team and I were on the front lines, helping clients respond and recover. We helped clients mitigate and remediate quickly. Most of them had recovered by midmorning.

Clarifying misconceptions

However, a week after the incident confusion still lingers. The biggest challenge we are seeing after Friday is that there is a lot of misunderstanding about exactly what went wrong and who was responsible for the outage. Some are still pointing fingers at Microsoft, but the confusion does not help the cause.

As an industry we need a clear understanding of the event’s root causes. This could have happened to anyone. Most major cybersecurity and software vendors have released faulty updates at some stage. This was so significant purely because of the scale of the software deployment and the fact that CrowdStrike has a Microsoft Kernel-Mode Code Signing Certificate.

Having such a certificate shows Microsoft considers the software to be genuine and secure. It allows CrowdStrike to quickly deploy applications into the core of the operating system to address cyber risks. While all IT vendors have encountered problematic files affecting users, the severity of this case was unprecedented. Usually, you simply roll back the deployment, but because this one was running in the kernel it was a tough recovery.

A catalyst for change

The unprecedented scale of the outage has sparked intense discussions about cybersecurity practices, vendor accountability and the risks associated with centralised IT services. I believe this incident could be a turning point for our industry.

Vendor accountability, testing and third-party risk management all come into play. It has opened a can of worms in terms of questions, and only in the coming weeks will we be able to answer these better.

One of the most promising developments emerging from this crisis is the possibility of a new collaborative approach to software testing and deployment. I envision a global testing alliance that could revolutionise the validation of updates before release.

A global outage affecting Microsoft services hit airlines, banks and health systems. Picture: MAILEE OSTEN-TAN/REUTERS
A global outage affecting Microsoft services hit airlines, banks and health systems. Picture: MAILEE OSTEN-TAN/REUTERS

There is the potential for a deployment alliance, where member vendors subscribe to best practice methodologies for testing software updates before deployment. A signing authority could also validate certain procedures. This would show vendor alignment with global best practice, and give assurances to customers.

This concept aligns with our long-standing advocacy for a collaborative defence model in cybersecurity. Such an alliance could greatly reduce the risk of similar incidents in the future while fostering greater trust between vendors and their clients.

The road ahead

The incident has highlighted the delicate balance between rapid response to cyberthreats and ensuring system stability. We are so at the forefront of staying ahead of cyber risks that some controls may have gone out of the window.

As the industry moves forward, the lessons learnt from this incident will shape cybersecurity practices for years to come. CrowdStrike has already announced plans to improve its testing procedures and implement a staggered deployment strategy for updates.

The incident is likely to cause some post-traumatic stress disorder in the industry and drive all vendors to be more rigorous about testing. While the full ramifications of the outage are still unfolding, one thing is clear: it has irreversibly altered the cybersecurity landscape.

As organisations worldwide re-evaluate their IT strategies and vendors revamp their processes, our industry is ready for a new era of collaboration, accountability and resilience.

• Osler is cofounder and business development director at Nclose.

subscribe Support our award-winning journalism. The Premium package (digital only) is R30 for the first month and thereafter you pay R129 p/m now ad-free for all subscribers.
Subscribe now

Would you like to comment on this article?
Sign up (it's quick and free) or sign in now.

Speech Bubbles

Please read our Comment Policy before commenting.