Colonial Pipeline Surprise Attack? Not Really...

Gaurav Banga
May 10, 2021 | 11 min read | Security Posture

The media is abuzz with the news of the main fuel supply line to the U.S. East Coast being shut down after the pipeline’s operator, Colonial, suffered what is believed to be the largest successful cyberattack on oil infrastructure in the country’s history.

Was the infosec industry surprised?

If you find me a CISO who was surprised, I’ll pay you a hundred dollars ($10,000 if you find a surprised CFO or CEO of a Fortune 1000).

First, let’s look at the facts, for those who came in late… 

Some key points being highlighted are:

  • The operator of Colonial Pipeline fell prey to a ransomware attack on Friday, forcing pipeline operations to stop.
  • The Colonial Pipeline carries 2.5 million barrels a day – 45% of the US East Coast’s supply of diesel, gasoline and jet fuel.
  • It’s currently not clear how long the outage will last.
  • A hacker group called DarkSide, which is set up as a “ransomware as a service” business model, is reportedly behind the cyberattack.
  • According to some reports, data was stolen before that attackers began locking systems with ransomware.

The Colonial ransomware attack is a high-profile example of the regular online assaults that companies, schools, hospitals, local governments and other organizations face. National security officials, industry leaders and cyber defenders have known for years about problems surrounding the nation’s critical infrastructure systems, however, the speed, investment and technological upgrades needed to mitigate these problems has been lacking.

There are two problems for critical infrastructure organizations: securing specialized operational technology (OT) systems and securing IT systems. For years, operational technology (OT) and other industrial control systems (ICS) were considered safe because they weren’t connected to the internet, but hackers have found ways to penetrate them through unsecured remote access and networked systems. These OT and ICS systems often run on older, vulnerable platforms. Patches for vulnerable software are often simply not available. Sometimes the operational application will not run on newer (patched) versions of the platform software.

Securing the IT side is also critical. For example, a successful breach of the billing system of a pipeline will almost always cause the operator to shut down the productional operational system because not many people are comfortable with giving away their product for free. Another obvious concern is about attack propagation from IT to OT—shutting things down is just the safe thing to do. This is what has happened at Colonial.

Basic security hygiene is not a fix

If you are reading this blog, you are probably familiar with the term basic cybersecurity hygiene. If you look on at various online articles and discussion forms about this breach, you will see a number of comments complaining about the importance of basic security hygiene in preventing such attacks. When people use this term, they are referring to various checklists of measures that cyber-defenders should adopt. Often, we give these checklists fancy acronyms as names. The problem is everyone has their own notion of what “basic” means. Compliance against some non-quantifiable standard is not good enough. We need to get a lot more quantitative about how we describe enterprise cybersecurity hygiene or posture.

The metrics that matter

Two metrics matter the most: coverage and mean exposure time to risk.

Coverage is about what % of your assets are being monitored and analyzed continuously for risk. Obviously, you cannot protect what you don’t know about, and it is imperative that you maintain a real-time asset inventory system with accurate state information about your IT, cybersecurity, and business context. Unfortunately, the average enterprise has less than 75% coverage of its assets, and 5% coverage of risk from various attack vectors.  Do you know what percentage of your organizations’ assets are being monitored for cyber-risk?

New security vulnerabilities become known all the time. From time to time your security tools will generate events that indicate attacks in progress or a compromised asset. Your Mean Exposure Time (MET) is the average time it takes to discover and resolve a cyber-risk issue. It equals the Mean Discovery Time (MDT) plus the Mean Time to Resolve (MTTR).

The industry average for MDT in IT networks is 15 days and for MTTR is 154 days. The Mean Arrival Time for new security issues to emerge in the cyber-battlefield is now single-digit-days. As attackers use increasing amounts of automation, this time duration will only decrease.

With this huge gap between these rates, days vs months, it is no wonder that organizations are very vulnerable and exposed to attack. When it comes to attacks on industrial control systems and OT, this can be even longer, given the average age and complexity of these technologies.

This is a huge challenge because what ultimately determines success in your cybersecurity program is the time it takes you to contain a new risk event.

How to decrease MET and MTTR

Yours truly has conversations daily with senior executives and IT/security operational staff who think that a 90-day MTTR is perfectly fine. “This is how we have always done it.”

A 90-day MTTR means the organization is always vulnerable and can be easily compromised anytime. Attackers have in fact automated the discovery of vulnerable organizations.

The fact is that most security and IT teams are stretched thin and simply cannot go faster. That said, here is what every organization can do to drive down its MDT and MTTR.

1. Discover your attack surface and identify what needs to be protected

You need an automated, up-to-date, and comprehensive asset inventory (devices, applications, services and users) across on-prem, cloud and 3rd parties. This also includes performing asset criticality analysis to spotlight business critical assets- literally building a digital twin of your enterprise for the purposes of cybersecurity.

Track and drive your asset coverage to 100%

2. Continuous assessment and monitoring of cyber risk

Have systems in place to identify risk as it emerges by continuously assessing all enterprise assets for vulnerabilities and risk items across all relevant attack vectors.

Track and drive your attack vector coverage to 100%

3. Evaluate vulnerabilities and risk items

Calculate risk using 5 factors – vulnerabilities, threats, asset exposure, business criticality, and compensating controls and quantify it in $ terms and prioritize vulnerabilities and risk items based on risk.

Define and adjust target SLAs for each risk class and set up mitigation workflows for different types of vulnerabilities as they emerge.

Track and drive your mean discovery time to minutes/hours.

4. Dispatch risk to various owners

As risk items are discovered, assign risk owners for each open vulnerability/risk item and dispatch required action items such as indicators of risk, indicators of compromise, and indicators of attack to risk owners for mitigation.

This dispatch of risk to owners must be automatic, within 100s of milliseconds of discovery. No human handoffs should be required from assessment teams to people who perform the actual risk mitigations.

5. Incentivize and enable owners to act quickly to mitigate risk  

Provide a set of tools and processes that risk owners use to contain risk quickly. These are maximally automated playbooks for testing updates, patching, incidence response and recovery. Enable multiple options for risk mitigation instead of just relying on patching. Gamify risk mitigation by setting up goals and rewards, while periodically publishing leaderboards and sending nags.

Track and drive down mean-time-to-resolve from weeks to single digit days.

The bottom line

What would be the impact of streamlining your Mean Exposure Time from 154 days to 4 days? A simple, back of the envelope calculation shows that this would give you a 95% reduction in exposure and risk.

Interested?

Here is the first test of your speed.

  • How quickly can you discover your organization’s Coverage and MET/MTTR? Maybe you already know it…
  • How quickly can you organize a project in your team with a goal to reduce MTTR to 4 days?

Please contact Balbix and we’ll get you started!