A cybersecurity crisis that shook the world

The CrowdStrike Incident 2024

Technical Analysis Timeline of the 2024 CrowdStrike Incident Conclusion

A faulty update triggers a global IT crisis that became known as «Y2K24» Millions of Windows systems crashed, critical infrastructure was paralysed and companies worldwide were affected. Find out how CrowdStrike reacted, what measures were taken and what lessons were learnt for the future.

A cybersecurity crisis that shook the world

On 19 July 2024, a routine update from CrowdStrike, a leading provider of cybersecurity solutions, led to an unprecedented global IT outage. The event, which became known as «Y2K24», had far-reaching consequences and demonstrated the vulnerability of our modern IT infrastructures.

Technical Analysis

The recovery of the affected systems took a long time. For systems that are additionally protected with BitLocker, the so-called BitLocker recovery key also had to be obtained so that the recovery process could be completed correctly. This led to further problems in various environments, as the system managing these recovery keys was also affected by the failure.

The cyber security industry reacted quickly to the incident. Several companies began to review their update and distribution processes to avoid similar incidents in the future. CrowdStrike itself announced an extensive internal investigation and a review of its testing and release procedures.

Timeline of the 2024 CrowdStrike Incident

18. July 2024

Azure Independent Platform Outage: An Azure Independent Platform Outage is blocking access to storage and Microsoft 365 applications for some businesses in the central region of the United States.

19. July 2024

04:09: CrowdStrike distributes a configuration update for its Falcon driver software for Windows PCs and servers. The update causes machines to enter a boot loop or recovery mode.
04:09: Widespread crashes and reboots begin, starting in Oceania and Asia due to time zone.
05:27: CrowdStrike resets the content update.
06:48: Google Compute Engine reports issues with Windows VMs.
07:15: Google identifies the CrowdStrike update as the cause.
09:45: CrowdStrike CEO George Kurtz confirms that the fix has been deployed and assures that the issue is not the result of a cyberattack.

Immediate effects

Various times: Reports of disruption across multiple sectors, including airlines, banks, hospitals, government services and more.
Throughout the day: Emergency briefings and response actions by governments in several countries, including the United States, the United Kingdom and Australia.
Ongoing: Financial markets react with significant price losses in CrowdStrike and Microsoft.
Swiss International Air Lines cancels over 30% of flights: Continued operational chaos at Swiss, with significant disruptions and cancellations.

Subsequent days

Ongoing bug fixes: Many affected computers have to be repaired manually, leading to prolonged outages and disruptions in various sectors.
Industry response: Cyber security experts are calling for more redundancy and decentralised systems to prevent such widespread outages in the future.
Company and government responses: Ongoing efforts to restore normality and manage the impact of the outage.

Key Points and long-term Effects

Estimated financial loss: Around 10 billion US dollars in global financial damage.
Discussions about centralisation: The incident raises questions about the centralisation of IT infrastructure and the need for diversity among cybersecurity vendors.
Global reach: The outage affects multiple countries and sectors and reflects the widespread use of CrowdStrike and Microsoft products worldwide.

Conclusion

The incident shows us how vulnerable our modern, permanently connected IT systems are and how dependent we are on suppliers. Most of the companies affected were probably caught unprepared by this incident, at least in part, as they have been preparing for a ransomware incident or other malicious scenario in recent years. The fact that this is now a «simple» incident of a faulty update with massive consequences shows that the basics must not be ignored. In risk management, the hazards «software vulnerabilities or errors» and «failure of devices and systems» point to such scenarios and should be the corresponding trigger for dealing with these risks. Be it with appropriate backup & recovery processes, table top exercises or established business continuity management.

The crisis has also emphasised the importance of careful software development and robust testing processes to ensure the integrity and availability of IT systems. It is hoped that the lessons learnt from this incident will help to improve the security and reliability of digital infrastructures worldwide.