Focus Friday: Lessons from the CrowdStrike Update Outage on Global IT Resilience
Written By: Ferhat Dikbiyik
Welcome to this week’s Focus Friday, where we examine significant events affecting supply chains and third-party risk management. Today, we highlight the recent CrowdStrike update outage, which disrupted businesses globally. On July 19, 2024, a routine software update from CrowdStrike, a prominent cybersecurity firm, caused the Blue Screen of Death (BSOD) on thousands of Windows machines, leading to widespread operational disruptions. This issue, stemming from a defect within the update rather than a cyberattack, impacted critical sectors such as airlines, financial institutions, and healthcare services. In this blog, we will explore the extensive effects of this incident on global supply chains and the valuable lessons learned in enhancing IT resilience and preparedness.
What Happened: A Timeline of the CrowdStrike Update Outage
On July 19, 2024, businesses and critical infrastructure around the world experienced an unexpected and significant IT disruption. The incident began with a routine software update deployed by CrowdStrike, a leading cybersecurity firm known for its Falcon platform.
In the early hours of July 19, reports started emerging from Australian banks, airlines, and media companies about widespread system failures. Thousands of Windows machines were experiencing the Blue Screen of Death (BSOD), a critical error that forces the computer into a recovery boot loop, rendering it unusable. This issue quickly spread to other regions as businesses in Europe began their workday.
As the morning progressed, the impact of the outage became more apparent. Major airlines such as American Airlines, Delta, and United requested ground stops due to IT failures, causing flight cancellations and delays worldwide. The UK’s National Health Service (NHS) reported significant disruptions, with GP practices and pharmacies unable to access patient records or process electronic prescriptions.
CrowdStrike CEO George Kurtz addressed the situation, confirming that the issue was caused by a defect in a recent content update for Windows hosts. He assured the public that the problem was not a result of a cyberattack, but rather a software bug that affected the kernel-level driver used to secure Windows machines.
By midday, CrowdStrike had identified the defective update and deployed a fix. However, many systems required manual intervention to be restored to normal operation. IT administrators globally were instructed to boot affected machines into safe mode and delete specific system files to resolve the BSOD issue. Despite these efforts, the recovery process was slow and complex, particularly for cloud-based servers and remote devices.
Throughout the day, various sectors continued to struggle with the fallout. Airports reported long lines and delays as check-in systems remained offline. Financial institutions faced operational challenges, with some banks closing branches temporarily. Media companies, including Sky News, experienced broadcast interruptions, highlighting the widespread nature of the disruption.
As the day progresses, the full extent of the outage’s impact is still unfolding. Businesses and critical services worldwide are working tirelessly to restore normal operations. The ongoing challenges highlight the critical dependencies within global IT supply chains and the potential for a single point of failure to cause extensive disruptions. Organizations are facing the immediate task of recovery while simultaneously reassessing their incident response and contingency plans.
The CrowdStrike update outage serves as a powerful reminder of the importance of rigorous software testing and the need for robust resilience strategies to mitigate the impact of such unforeseen events. In the next section, we will explore how this incident specifically affected supply chains and what lessons can be learned to strengthen future preparedness.
Why TPRM Professionals Should Care
The CrowdStrike update outage is not just an isolated IT issue; it has far-reaching implications for businesses and their supply chains. Third-Party Risk Management (TPRM) professionals need to be acutely aware of such incidents, even if their organizations are not directly impacted. The ripple effect of this outage demonstrates the interconnectedness of modern supply chains and the potential for widespread disruption.
The Ripple Effect
While CrowdStrike’s update issue primarily affected Windows systems, its impact was felt across various sectors, including airlines, financial institutions, healthcare, and media. These sectors form the backbone of critical infrastructure and daily operations for many businesses. When key service providers like airlines and banks experience outages, the consequences can quickly spread to other dependent businesses, leading to operational slowdowns, financial losses, and reputational damage.
For TPRM professionals, this incident underscores the importance of understanding the dependencies and interconnections within their supply chains. Even if an organization is not directly using CrowdStrike’s services, their vendors might be. This means that any disruption at a vendor level can cascade down, affecting the organization’s ability to deliver products and services efficiently.
Lessons for TPRM Professionals
- Proactive Monitoring: TPRM professionals must continuously monitor their vendors for any signs of potential disruptions. This includes staying informed about any updates or changes in the vendors’ operations that might affect their own business.
- Communication Channels: Establishing clear and effective communication channels with vendors is crucial. In the event of an incident, quick and accurate information flow can help mitigate the impact and facilitate faster recovery.
- Incident Response Plans: It is essential to have robust incident response plans that include scenarios involving vendor disruptions. These plans should outline the steps to take when a key vendor is affected, ensuring business continuity.
- Vendor Assessments: Regularly assessing vendors’ resilience and their own incident response capabilities can provide insights into their ability to handle disruptions. This can include evaluating their software update policies, testing procedures, and recovery strategies.
- Collaboration: Working closely with vendors to develop joint incident response and recovery plans can enhance overall resilience. This collaborative approach ensures that both parties are prepared and can act swiftly in the face of disruptions.
- Awareness of Dependencies: Understanding the dependencies within the supply chain is critical. This includes knowing which vendors rely on other service providers and how disruptions can propagate through the network.
- Phishing Risks: In the midst of such incidents, there is an increased risk of phishing attacks. Threat actors often exploit the chaos to push malicious updates, harmful attachments, and even make phone calls to IT support to obtain sensitive information. As organizations work to restore their systems, it is crucial to remain alert for any phishing attempts. Employees should be trained to recognize suspicious emails and avoid clicking on links or downloading attachments from unknown sources. The urgency and confusion created by the outage provide a perfect environment for bad actors to strike.
As TPRM professionals analyze the CrowdStrike incident, they should consider these lessons to enhance their own risk management strategies. By being proactive and prepared, they can better manage the complexities of supply chain disruptions and ensure their organizations remain resilient.
Questions to Ask Vendors
In this rapid response situation, it is crucial to quickly assess the readiness and response of your vendors. Here are six essential questions TPRM professionals should ask their vendors:
- Have you applied the necessary patches and updates to address the CrowdStrike defect?
- How are you currently handling the recovery process for systems affected by the CrowdStrike update issue?
- What measures are in place to monitor and detect unusual activities related to the recent CrowdStrike update issue?
- Have you implemented additional access controls to prevent unauthorized access during this recovery period?
- What steps are you taking to protect against potential phishing attacks that may exploit the current situation?
- How are you communicating updates and recovery progress with your clients?
These questions will help TPRM professionals quickly gauge their vendors’ response efforts, ensuring any potential risks are identified and managed effectively.
Remediation Recommendations for Vendors
To address the risks associated with the CrowdStrike update outage and ensure swift recovery and future resilience, vendors should take the following actions:
- Immediately update all affected systems with the latest CrowdStrike patches. Ensure regular checks for updates to stay protected against similar issues in the future.
- Boot Windows systems into Safe Mode or the Windows Recovery Environment. Navigate to C:\Windows\System32\drivers\CrowdStrike and delete any files matching C-0000029*.sys.
- Continuously monitor system and security logs for any unusual activities that may indicate residual issues or attempts to exploit the defect.
- Update and review incident response plans to handle similar supply chain incidents more effectively. Ensure quick mitigation strategies are in place and that they are regularly tested.
- Implement strong access controls, such as multi-factor authentication (MFA), to prevent unauthorized access during the recovery period.
- Emphasize vigilance against potential phishing attempts. Train employees to recognize suspicious emails and avoid clicking on links or downloading attachments from unknown sources.
By following these recommendations, vendors can mitigate the impact of the current outage and enhance their preparedness for future incidents, ensuring better protection for their systems and the continuity of their operations.
Leveraging Black Kite for Effective TPRM
To effectively manage the risks associated with the CrowdStrike update outage, TPRM professionals can leverage Black Kite’s platform to gain critical insights and take proactive measures. Here’s how you can use Black Kite to navigate this incident:
Filtering Third-Party Vendors with the CrowdStrike Client Tag
Black Kite uses a comprehensive approach to determine which vendors are using CrowdStrike by analyzing customer stories, product identification, job postings, and other external sources. This method ensures the accurate identification of CrowdStrike clients. In response to the recent outage, Black Kite published the CrowdStrike Client FocusTagTM, enabling TPRM professionals to quickly identify at-risk vendors and assess their exposure.
Customers can filter their monitored entities using the CrowdStrike Client tag, allowing them to concentrate on vendors most likely impacted by the outage. This targeted approach ensures that instead of sending questionnaires to all vendors, Black Kite customers can directly reach out to those at risk, optimizing their efforts and resources.
- Log in to your Black Kite account and navigate to the vendor management section.
- Use the CrowdStrike Client tag to filter your list of third-party vendors. This will display all vendors that are potentially impacted by the CrowdStrike update issue.
- For each vendor identified, review their status and any updates they have provided regarding the issue. This will help you understand their current situation and what actions they are taking to mitigate the risk. Consider the criticality of the vendor and other cyber risk factors attributed to it by the Black Kite platform.
Utilizing the Supply Chain Module
Black Kite’s new Supply Chain Module offers an advanced way to uncover risks beyond direct third parties. Customers can search for CrowdStrike within their supply chain and visualize the concentration risk. The tool enables TPRM professionals to filter their supply chain (including third, fourth, and fifth parties) based on CrowdStrike usage, and further narrow down the search by adding additional filters such as cyber grade, risk score, and compliance ratings.
Operationalizing this tool involves several steps:
- Filter by CrowdStrike Client Tag: Log in to your Black Kite account and apply the CrowdStrike Client tag to filter your list of vendors. This will display all vendors potentially affected by the CrowdStrike update issue.
- Visualize Concentration Risk: Use the Supply Chain Module to visualize the concentration risk within your supply chain. Identify clusters of vendors heavily dependent on CrowdStrike or other critical service providers, helping you understand the potential impact of the outage on your operations.
- Assess Exposure and Mitigation: Once identified, assess these vendors for their exposure to the recent outage. This targeted approach allows TPRM professionals to focus their efforts on the most at-risk entities, enhancing the efficiency of their risk management processes.
Extended Supply Chain Analysis
The Supply Chain Module’s filtering capabilities provide a comprehensive view of an organization’s extended supply chain. By visualizing the interconnections and potential concentration risks, TPRM professionals can better understand the overall impact of the CrowdStrike outage. This holistic approach enables proactive risk mitigation, ensuring that vulnerabilities are addressed promptly and effectively.
Black Kite customers can further leverage this tool by integrating it into their regular risk assessment routines. Regularly updating the filters and parameters based on the latest threat intelligence allows for continuous monitoring and early detection of potential risks. This proactive stance not only helps in managing current vulnerabilities but also prepares organizations for future threats, maintaining the integrity and security of their supply chain.
For more details, visit the Black Kite Supply Chain Module.
This structured and detailed approach ensures that TPRM professionals can effectively leverage Black Kite’s tools to manage the risks posed by the CrowdStrike update outage, enhancing their overall resilience and preparedness.
Conclusion
The CrowdStrike update outage serves as a critical reminder of the complexities and vulnerabilities inherent in global IT supply chains. TPRM professionals must stay vigilant, proactive, and prepared to manage the ripple effects of such incidents. By leveraging Black Kite’s comprehensive tools and FocusTagsTM, organizations can effectively identify and mitigate risks, ensuring resilience and continuity.
In addition to the CrowdStrike incident, it’s essential to remain informed about other significant vulnerabilities that can impact your supply chain. For further insights, refer to our other Focus Friday blog post, which provides detailed TPRM insights on recent incidents involving Serv-U FTP, Microsoft SharePoint, Citrix NetScaler, ServiceNow, Exim Mail, and GeoServer. Understanding and managing these risks is crucial for maintaining the security and integrity of your operations.
Stay tuned to Black Kite for continuous updates and insights on managing third-party risks effectively.
About Focus Friday
Every week, we delve into the realms of critical vulnerabilities and their implications from a Third-Party Risk Management (TPRM) perspective. This series is dedicated to shedding light on pressing cybersecurity threats, offering in-depth analyses, and providing actionable insights.
FocusTagsTM in the Last 30 Days:
- Serv-U FTP: CVE-2024-28995, Directory Traversal Vulnerability in SolarWinds Serv-U.
- Microsoft SharePoint: CVE-2024-38094, Remote Code Execution Vulnerability in Microsoft SharePoint.
- Citrix NetScaler: CVE-2024-6235, Information Disclosure Vulnerability in Citrix NetScaler.
- ServiceNow: CVE-2024-4879, Input Validation Vulnerability in ServiceNow.
- Exim Mail: CVE-2024-39929, Security Restriction Bypass Vulnerability in Exim Mail Servers.
- GeoServer: CVE-2024-36401, Eval Injection and RCE Vulnerability in GeoServer.
- PHP-CGI: CVE-2024-4577, OS Command Injection Vulnerability in PHP-CGI Module.
- Microsoft MSMQ: CVE-2024-30080, Use After Free, Remote Code Execution Vulnerability in Microsoft Message Queuing (MSMQ).
- Rejetto HFS: CVE-2024-23692, Template Injection Vulnerability, Unauthenticated RCE Vulnerability in Rejetto HTTP File Server
- Checkpoint SNX: CVE-2024-24919, An Information Disclosure Vulnerability in Check Point’s CloudGuard Network, Quantum Maestro, Quantum Scalable Chassis, Quantum Security Gateways, Quantum Spark Appliances
- DNSBomb: CVE-2023-28450, DNSBomb Attack in Dnsmasq, CoreDNS, Knot DNS, Simple DNS Plus, Technitium DNS, MaraDNS, CoreDNS
- Veeam VBEM: CVE-2024-29849, Authentication Bypass Vulnerability in Veeam Backup Enterprise Manager
References:
https://www.bbc.com/news/live/cnk4jdwp49et
https://www.theverge.com/2024/7/19/24201717/windows-bsod-crowdstrike-outage-issue
https://socradar.io/crowdstrike-update-causing-blue-screen-of-death-and-microsoft-365-azure-outage
https://www.bleepingcomputer.com/news/security/crowdstrike-update-crashes-windows-systems-causes-outages-worldwide/
https://gist.github.com/whichbuffer/7830c73711589dcf9e7a5217797ca617