Introduction
  • 28 Feb 2024
  • 5 Minutes to read
  • Contributors
  • Dark
    Light

Introduction

  • Dark
    Light

Article summary

As the digital landscape and corresponding threat surfaces have grown, more and more security tools have been deployed to address the threats. More data silos have been created, resulting in greater data complexity and the inability of security organizations to look at the bigger picture and rapidly develop mitigation strategies. AI/ML investments are being used in the hopes of solving these issues but are only as good as the fitness of the data being analyzed. This has left security operations teams with an utter mess. It takes analysts hours of pivoting between toolsets to piece together information, only to find that they still lack the historical context or real-time focus they need for timely response, answers to management, and cyber risk sharing with the board. For companies like Comcast, operating at a vast scale, all this available but unactionable data was frustrating. Data explosion from hybrid sources has created a vast amount of data that needs to be analyzed. The cost to store, analyze, and compute this data is not feasible. Data silos, created by best-of-breed tool adoption, and storage costs drive incomplete analysis and hunting efforts. Data complexity increases as cybersecurity experts are spending massive amounts of time structuring disparate data sources. A Security Operations Center (SOC) or Governance, Risk, and Compliance (GRC) department would require an army of analysts to piece together information, pivoting between too many security & risk tools and vendor-native dashboards. It would take days to sift through thousands of IOCs. There is an urgent need to establish a global enterprise security data strategy, provide consistent executive KPIs, and automatically integrate security and enterprise data. This led us to build a security, risk & compliance data fabric called DataBee.

What is DataBeeTM, from Comcast Technology Solutions

DataBeeTM, from Comcast Technology Solutions is a cloud-native security and compliance data fabric that ingests data from multiple disparate feeds, then aggregates, compresses, standardizes, enriches, correlates, and normalizes before transferring a full time-series dataset to your data lake of choice. DataBee’s data fabric is a glue between disparate tools enabling end users to extract more value from their data. DataBee brings together and enriches security and other enterprise data for a complete picture. DataBee supports continuous compliance, SIEM de-coupling, simple & advanced threat hunting and behavioral baselines with anomaly detection.

Comcast DataBee is a powerful data integration platform that enables users to extract data from various sources, transform it into the desired format, load it into a target system, and visualize the data using business intelligence (BI) tools. With the help of DataBee, you can efficiently manage and manipulate large amounts of data to gain insights and make informed decisions.

DataBee Architecture

image

The Comcast DataBee architecture is designed to decouple the SIEM and other security and compliance analytics layers from the data ingest, cleaning, transformation, and storage layers. We focus on the beginning of the security data pipeline, making sure you have ALL your data in a clean, usable format, before you apply analytics. We also provide UI access to enable custom search, compliance, analysis, and ML models. The open architecture allows the ability to supplement DataBee UI, analytics, or ML via a common, unified data source. DataBee connects to multiple data sources and loads your data from them. Once the data is loaded, DataBee transforms your data through aggregation, compression, standardization, enrichment, correlation, and normalization. After DataBee performs data transformation operations, the data is converted into an organized and structured format. You can now store this highly normalized & enriched data in your data lake of choice, such as Snowflake or Databricks. You can also deploy this transformed data into your existing BI tools & generate improved reporting and metrics. The DataBee UI allows users to easily configure new data source feeds, search for relevant information and monitor the health and status of the data quality.

DataBee high-level architecture

DataBee-Simplified-Architecture-for-Tech-Docs-Brown-Transparent

Users can access the DataBee UI, serving as a gateway to streamlined data management. User configuration changes flow from the interface to the configuration store and ultimately to the pipeline orchestrator. Both the configuration store and pipeline orchestrator receive data source mappings from the feed content store. Data sources are passed through the data pipeline executor and undergoes the Open Cybersecurity Schema Framework (OCSF) normalization process. The processed data then finds its way to Snowflake through Snowpipe or Databricks through Auto loader, offering a robust storage and search solution. Datamarts are specialized views over the OCSF tables. Their definitions are provided by the DataBee team through the DM management service and finally executed in the data lake. Data can also be accessed using popular BI tools such as Tableau and Power BI to view insightful visualizations and analyses leveraging dashboards and content built by DataBee or customized by the user.

External services

Metrics and monitoring- data quality and system health and status metrics repository and alerting service

Comcast DataBee’s data pipeline executor seamlessly ingests customer data from multiple AWS sources. The data pipeline executor extracts the data, performs transformations, and loads the enriched data to various services. It generates essential data quality metrics used to continuously monitor the health of each individual data source. Real-time status and health updates are made available in the DataBee UI.

External customer

User- Customers or end-users who interact with DataBee user interface to access its features, functionalities, and services.

DataBee environment

User interface- DataBee’s web-based user interface for configuring DataBee, data ingest, search and other features

Configuration store- Configuration storage service

Pipeline orchestrator- Synchronizes Data Feed Content with data pipelines running in the data pipeline executor based on user configuration and preferences stored in the configuration store

Feed content- stores data source mappings

Data pipeline executor- Data flow pipeline tool for transforming, normalizing, correlating and outputting data in a streaming fashion

OCSF normalization- OCSF(Open Cybersecurity Schema Framework) is an industry standard and vendor-agnostic core security schema for transmitting and storing information relevant to cybersecurity. OCSF normalization involves structuring and organizing cybersecurity data to conform to consistent and standardized formats

DM management- Synchronizes data mart content files with tables and views in Snowflake or Databricks

DM definitions- Collection of data mart views

Customer environment

Customer AWS- User owned and operated AWS account that hosts the raw data sources

Snowpipe- Snowflake service for performing batch updates to tables in Snowflake

Snowflake- A data lake for storing security logs

Auto Loader- A Databricks feature that facilitates automated data loading to designated tables in Databricks

Databricks- Analytics platform for scalable data analysis and processing

Tableau- A product for creating business intelligence dashboards

Power BI- A product for creating business intelligence dashboards

Use cases solved by DataBee:

DataBee can be used to help address many use cases across security, risk, and compliance including but not limited to:

  1. Continuous control monitoring & PCI-DSS 4.0 Preparedness

  2. High-volume security data event analysis and detection

  3. Insider and incident threat hunting

  4. SIEM optimization and aggregation

  5. Improving security hygiene


Was this article helpful?

Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.
ESC

Eddy AI, facilitating knowledge discovery through conversational intelligence