Is It Time to Rethink That Cybersecurity Data Lake?

Gaurav Banga
August 3, 2021 | 11 min read | Cybersecurity Strategy, Security Posture

You have probably heard the story of the kid who had everything but was still sad. Here is a tale of a CISO who has everything but is still very unhappy because she can’t quantify her organization’s cyber risk…

Our CISO’s organization has invested in dozens of cybersecurity tools in the last few years. Her team is trying to manage vulnerabilities aggressively, with continuous scanning and patching. They have subscribed to multiple threat feeds from open source, proprietary and government sources. They have deployed MFA, EDR and next-gen IDS/IPS, invested in a SIEM and a SOC, and have implemented detect and containment playbooks. This year they are deploying a cloud security tool, a BAS tool and thinking about zero trust.

Our CISO’s organization also has a long running cybersecurity data lake project, streaming the output of all tools into a central place. This is where they do analytics, trying to come up with a unified picture of overall cyber risk which will help stakeholders make better cybersecurity decisions.

Familiar?

Here are 3 brutal truths that will surprise no one…

1. Data ≠ visibility

Your dozens of tools may be generating lots of data. However, mountains of data do not result in better cybersecurity visibility. It is difficult to sift through terabytes or petabytes of data to find what you are looking for, even with your data lake code. You may also have experienced that your data lake code is ad-hoc, slow, brittle, and takes quite a bit of continuous effort to develop and maintain. Some of you are resigned to hearing excuses and apologies from your data lake team, quarter after quarter, in response to your feature requests.

2. Can’t measure risk

Your tools have different formats and semantics for the same attributes of assets, apps and users. Cybersecurity context tends to reside in specialized cybersecurity tools. IT context is embedded in IT tools such as AD, CMDB and ticketing systems, while business context tends to be spread across a 3rd set of databases and spreadsheets. These tools often surface contradictory information. Unifying this data into a common schema is very difficult. Different stakeholders also speak different languages, and it is nearly impossible to reconcile to a commonly understood risk metric. In desperation, you may have invented your own proprietary (read: opaque) risk scoring system, which you are trying to evangelize and push to the various stakeholders in your organization.

3. Partial remediation

New vulnerabilities and security issues emerge at a very rapid rate. It is very hard to keep up. Because risk cannot be accurately quantified, security issues and risk items cannot be quickly identified, prioritized and remediated. Critical items are missed. In most organizations, the mean time to mitigate security issues is weeks or months, and during this time your organization is open to compromise by attackers. You are constantly wondering if your existing security controls compensate for these emerging risk items, or not. Or if you need to invest in additional tools? And how urgent is this? What’s your risk?

This inability to measure cyber risk accurately makes it very difficult to make the right cybersecurity decisions. For example, your CFO and board may not appreciate that a security maturity score of 7/10 corresponds to an expected loss of $25M from data breach events this year, which may be completely unacceptable. It is also hard for a CISO to demonstrate “where we are” on cyber risk or showcase the ROI of a cybersecurity initiative. This is not a new story. The infosec industry has struggled for a long time to quantify the cybersecurity posture of their organizations in clear cyber risk terms denominated in Dollars (or Euros, Pounds, Yen, etc.), while at the same time ensuring that the risk calculations accurately reflect the on-network conditions.

Today we are hoping to change all this!

Today, Balbix is announcing the launch of our Automated Cyber Risk Quantification (CRQ) solution. This new offering allows organizations to produce a single, comprehensive view of their cyber risk, by ingesting, unifying and analyzing data from a broad set of IT, cybersecurity and business tools. Balbix uses specialized machine learning and automation to quantify both the likelihood and the impact of a potential breach, and remove complex and error-prone tasks, and quantify your enterprise’s cyber risk in dollars (or other currencies).

The picture below shows a “brain scan” of Balbix.

On the left edge of the picture, you will see a sample of typical inputs to Balbix: various IT, cybersecurity and business data sources. On the right, side of the picture, we have the outputs – risk metrics, mitigation plan and actions, alerts/notifications, benchmarks, scorecards and trends. This output is available to users via online dashboards, customizable based on role. The Balbix output is also available via APIs to other tools and systems.

Each interior node represents an ensemble of specialized ML models that has been purpose-built to solve a specific problem in the overall risk calculation. For example, the host enumeration (Host Enum) node performs deduplication of assets across all data stream signals that are fed into the Balbix brain and provides this information to all nodes in the system. Anyone who has ever tried to correlate events from different IT and cybersecurity tools appreciates that deduplication is hard and messy. Balbix’s Host Enum is a powerful self-learning ML system that would be impossible for humans to replicate. There are nearly a hundred ML models like Host Enum in the Balbix brain.

Here are key aspects of our new solution.

  • Data sources include vulnerability assessment tools, CMDB, EDR, firewalls, SIEM, MDM systems, AppSec systems, OT/IoT management systems, Active Directory, DNS/DHCP and cloud infrastructure APIs. Balbix can also quickly ingest data from proprietary tools.
  • Unified asset inventory is a foundational capability of the system. It includes the enumeration (de-dup) and categorization of assets.
  • The breach likelihood of an asset is calculated as a weighted sum of Breach Likelihood from individual attack vectors. For each attack vector, we consider vulnerability severity, exposure, threat level and security controls.
  • Breach impact is estimated by first estimating relative asset criticality of various assets automatically, followed by user-input from the CISO and various risk owners.
  • Hard risk metrics denominated in monetary terms with clear action items enables everyone involved to make better decisions faster, and vulnerabilities and security issues are mitigated faster.
  • Risk dashboards and reports enable the gamification of cyber risk reduction and the demonstration the value of your security program to senior leadership and the board.
  • The system has been designed for extreme scale. Balbix typically ingests several 100s of terabytes (each) per day from customers with environments containing 250,000 assets. For our largest customer, we analyze more than a petabyte every day.

You can read more details about our new solution here.

Our early customers have seen fantastic outcomes from their use of this new capability: an accurate and unified asset inventory, better prioritization and more automation, 98% reduction in mean-time-to-mitigate risk issues, and 10x improvement in efficiency of everyday infosec team activities. Everyone is on the same page because all dashboards and widgets speak to risk in money terms. Our customer CFOs actually understand and appreciate what is happening in the infosec program!

It is easy to get started with Balbix. After deciding to proceed, our customers typically take a week to identify the data sources and organize data snapshots for the initial deployment. Activation and initial data ingestion take less than an hour. It takes the Balbix brain a few days to understand and baseline your organization. Typically, in 7 days or less after we start, we can provide the first readout to operational and executive stakeholders, surfacing numerous previously unknown insights and risk issues showcasing capabilities for multiple use-cases you were hoping to solve with your data lake project.

Please contact Balbix and we’ll get you started!