- Print
- DarkLight
As the digital landscape and corresponding threat surfaces have grown, more and more security tools have been deployed to address the threats. More data silos have been created, resulting in greater data complexity and the inability of security organizations to look at the bigger picture and rapidly develop mitigation strategies. AI/ML investments are being used in the hopes of solving these issues but are only as good as the fitness of the data being analyzed. This has left security operations teams with an utter mess. It takes analysts hours of pivoting between toolsets to piece together information, only to find that they still lack the historical context or real-time focus they need for timely response, answers to management, and cyber risk sharing with the board. For companies like Comcast, operating at a vast scale, all this available but unactionable data was frustrating. Data explosion from hybrid sources has created a vast amount of data that needs to be analyzed. The cost to store, analyze, and compute this data is not feasible. Data silos, created by best-of-breed tool adoption, and storage costs drive incomplete analysis and hunting efforts. Data complexity increases as cybersecurity experts are spending massive amounts of time structuring disparate data sources. A Security Operations Center (SOC) or Governance, Risk, and Compliance (GRC) department would require an army of analysts to piece together information, pivoting between too many security & risk tools and vendor-native dashboards. It would take days to sift through thousands of IOCs. There is an urgent need to establish a global enterprise security data strategy, provide consistent executive KPIs, and automatically integrate security and enterprise data. This led us to build a security, risk & compliance data fabric called DataBee.
What is DataBeeTM, from Comcast Technology Solutions
DataBeeTM, from Comcast Technology Solutions is a cloud-native security and compliance data fabric that ingests data from multiple disparate feeds, then aggregates, compresses, standardizes, enriches, correlates, and normalizes before transferring a full time-series dataset to your data lake of choice. DataBee’s data fabric is a glue between disparate tools enabling end users to extract more value from their data. DataBee brings together and enriches security and other enterprise data for a complete picture. DataBee supports continuous compliance, SIEM de-coupling, simple & advanced threat hunting and behavioral baselines with anomaly detection.
Comcast DataBee is a powerful data integration platform that enables users to extract data from various sources, transform it into the desired format, load it into a target system, and visualize the data using business intelligence (BI) tools. With the help of DataBee, you can efficiently manage and manipulate large amounts of data to gain insights and make informed decisions.
DataBee Architecture
The Comcast DataBee architecture is designed to decouple the SIEM and other security and compliance analytics layers from the data ingest, cleaning, transformation, and storage layers. We focus on the beginning of the security data pipeline, making sure you have ALL your data in a clean, usable format, before you apply analytics. We also provide UI access to enable custom search, compliance, analysis, and ML models. The open architecture allows the ability to supplement DataBee UI, analytics, or ML via a common, unified data source. DataBee connects to multiple data sources and loads your data from them. Once the data is loaded, DataBee transforms your data through aggregation, compression, standardization, enrichment, correlation, and normalization. After DataBee performs data transformation operations, the data is converted into an organized and structured format. You can now store this highly normalized & enriched data in your data lake of choice, such as Snowflake or Databricks. You can also deploy this transformed data into your existing BI tools & generate improved reporting and metrics. The DataBee UI allows users to easily configure new data source feeds, search for relevant information and monitor the health and status of the data quality.
DataBee high-level architecture
Users can access the DataBee UI, serving as a gateway to streamlined data management. User configuration changes flow from the interface to the configuration store and ultimately to the pipeline orchestrator. Both the configuration store and pipeline orchestrator receive data source mappings from the feed content store. Data sources are passed through the data pipeline executor and undergoes the Open Cybersecurity Schema Framework (OCSF) normalization process. The processed data then finds its way to Snowflake through Snowpipe or Databricks through Auto loader, offering a robust storage and search solution. Datamarts are specialized views over the OCSF tables. Their definitions are provided by the DataBee team through the DM management service and finally executed in the data lake. Data can also be accessed using popular BI tools such as Tableau and Power BI to view insightful visualizations and analyses leveraging dashboards and content built by DataBee or customized by the user.
External services
Metrics and monitoring- data quality and system health and status metrics repository and alerting service
Comcast DataBee’s data pipeline executor seamlessly ingests customer data from multiple AWS sources. The data pipeline executor extracts the data, performs transformations, and loads the enriched data to various services. It generates essential data quality metrics used to continuously monitor the health of each individual data source. Real-time status and health updates are made available in the DataBee UI.
External customer
User- Customers or end-users who interact with DataBee user interface to access its features, functionalities, and services.
DataBee environment
User interface- DataBee’s web-based user interface for configuring DataBee, data ingest, search and other features
Configuration store- Configuration storage service
Pipeline orchestrator- Synchronizes Data Feed Content with data pipelines running in the data pipeline executor based on user configuration and preferences stored in the configuration store
Feed content- stores data source mappings
Data pipeline executor- Data flow pipeline tool for transforming, normalizing, correlating and outputting data in a streaming fashion
OCSF normalization- OCSF(Open Cybersecurity Schema Framework) is an industry standard and vendor-agnostic core security schema for transmitting and storing information relevant to cybersecurity. OCSF normalization involves structuring and organizing cybersecurity data to conform to consistent and standardized formats
DM management- Synchronizes data mart content files with tables and views in Snowflake or Databricks
DM definitions- Collection of data mart views
Customer environment
Customer AWS- User owned and operated AWS account that hosts the raw data sources
Snowpipe- Snowflake service for performing batch updates to tables in Snowflake
Snowflake- A data lake for storing security logs
Auto Loader- A Databricks feature that facilitates automated data loading to designated tables in Databricks
Databricks- Analytics platform for scalable data analysis and processing
Tableau- A product for creating business intelligence dashboards
Power BI- A product for creating business intelligence dashboards
Use cases solved by DataBee:
DataBee can be used to help address many use cases across security, risk, and compliance including but not limited to:
Continuous control monitoring & PCI-DSS 4.0 Preparedness
High-volume security data event analysis and detection
Insider and incident threat hunting
SIEM optimization and aggregation
Improving security hygiene