The importance of Comprehensive Business Continuity Planning – what this week’s outage has taught us.

Overview

A faulty update to a world-renowned security tool, CrowdStrike had inadvertently caused what has been described as the largest IT outage in history (19 July 2024). For many organisations this could be their first exposure of implementing their Business Continuity Plans (BCP) on mass and in the majority of instances, it exposes untested gaps that weren’t considered during the planning phases.

Whilst many of the larger organisations can adapt and ride out the chaos, disruptions of this scale can easily overwhelm smaller organisations and can have a significant lasting financial and reputational impact – perhaps even to the point of bankruptcy.

This blog post will hopefully provide some guidance on some of the current best practices when it comes to Business Continuity Planning so that you can ensure you are in a stronger position, should such an event happen to you.

Key Considerations

The key considerations from HiddenBytes during the planning and creation of a comprehensive BCP include:

Anticipate offers of a “quick fix”: The sad reality is during any events, there are always people out there aiming to capitalise on other people’s misfortunes. Staff members should be educated and made aware of these possibilities to ensure an organisation does not suffer any additional, secondary impacts of an event.

It is paramount that you keep a cool head and not act too hastily. The reality is that the damage will have already occurred; and introducing a new service or product is not going to turn back the time and undo the damage.

 

Example 1 – the cyber-criminal: A cyber-criminal leverages the anxiety and panic caused by the outage and creates a malicious payload. This payload is hosted on a new domain they have purchased and set up to appear as if it is a legitimate vendor site.

Impacted users are encouraged to download this “patch” created by the criminals which promises to fix or prevent the outage.

Unsuspecting victim downloads and runs the malicious software; and the cyber-criminal has now compromised your business allowing for subsequent activity such as data theft, ransom, extortion and/ or blackmail.

 

Example 2 – the ambulance chaser: A security services provider leverages the ongoing chaos and advertises/ sells you a product that claims to be the silver bullet to solve all your concerns.

As demonstrated during this outage, if an issue was caused by a legitimate, reputable vendor, they will provide any official remediation actions and support through their pre-established support channels (such as CrowdStrike’s Tech Alert) and not through unsolicited messages, phone calls or websites.

 

Be Prepared

Every business should have a Business Continuity Plan for navigating outages and other business disruptions. Some key considerations include:

Critical Systems Identification: Systems essential for daily operations should be identified, prioritised and documented as part of a Business Impact Analysis (BIA). In the event of any incident, these systems are the ones that must be recovered in order of priority. In addition, the plans should document the data flow and how each system interacts with each other so that the appropriate impact assessments can take place during an event. By performing this analysis, you can identify your minimum viable organisational requirements and aim to continue to provide these capabilities post-incident. Nb. You will not be able to provide your entire roster of services!

 

Communication Plan Development: Develop an alternative, out-of-band communication plan. If your primary communication system is disrupted, you must have a preestablished alternative method to keep employees and customers informed about the situation, mitigation efforts, and anticipated recovery timelines

 

Offline Process: Can your operations continue if there was a sustained network outage? Your business continuity plan must account for this, and you might find that an offline process may be needed (pen and paper)! Linking back to the critical system identification, it may require additional training of the critical employees to be able to run this service offline.

 

Perform Regular (appropriate) Backups: Implement a robust backup strategy with frequent backups stored securely off-site. This ensures critical data recovery in the event of a disaster, whether technical or malicious.

An often-overlooked factor of backups are your passwords and recovery keys – if these are stored on the same infrastructure as the encrypted items, should there be any issues with the network environment you would not be able to access it.

One of the recommended remediation for the CrowdStrike outage was for affected endpoints to be booted into Safe mode, so that the problematic file can be removed. In most corporate environment, the BIOS is password protected to prevent unauthorised personnel from making changes to the BIOS and booting into safe mode.

After the problematic file is removed, if the user was to boot back into a BitLocker volume; this will require a BitLocker recovery key. If these keys cannot be recovered; the data will remain inaccessible and, in most cases, unrecoverable.

Current best practices recommends using, at minimum, a 3-2-1 backup strategy.

  • 3 Copies of your data at all times (production and two copies).
  • 2 Backup media, at different physical locations.
  • 1 Copy used for any recovery.

 

Plan for different scenarios: Cybersecurity is only a small aspect of a BCP. A comprehensive plan must account for more than technology disruptions and include considerations such as natural disasters, civil unrest, wars, pandemics, and even accidents/ errors. Any and all scenarios that could impact your organisation, however unlikely they may seem, should be considered.

Conduct regular drills: Just like a fire drill at school, processes must be practiced regularly to identify any gaps or areas of improvements; and to ensure all staff members are familiar with their roles, responsibilities and expectations during an event. Additionally, multiple scenarios must be undertaken as part of the tests as they may lead to missed elements or additional considerations.

Final Thoughts

Finally, at HiddenBytes we would like to reiterate that whilst this outage could have been prevented, or its impact reduced, had the appropriate quality assurance controls been implemented and/ or operating effectively, we should not forget that:

  • This event should not overshadow the fantastic work CrowdStrike have been doing and will hopefully continue to do for the cybersecurity community.
  • There is no silver bullet when it comes to protection against these types of incidents. A defence in depth approach across your environment is much more effective and more likely to reduce any impact seen, if performed adequately.
  • Incidents happen! We cannot prevent all errors, accidents, or threat actors, but we can make sure that we prepare our response effectively for when they do occur.

 

If your small organisation requires any straight-talking advice on any of the topics discussed today, please reach out to us at hello[at]hiddenbytes.co.uk