- Print
- DarkLight
The Data Quality features are designed to help data engineering teams and administrators to assess and maintain the data quality per feed. They offer a range of features to monitor data flows, view alerts for quality changes, and provide insights into data performance. Let's explore the key features.
Key Features
Data Ingest Metrics- Displays a wide range of data ingest metrics such as amount of data ingested, amount of data processed, feed mapping percentage, and more.
Real-Time Monitoring- Continuously monitors data flows and processing pipelines, tracking data flow, ensuring ETL pipeline transparency is consistently maintained.
Alerting Mechanism- Notifies the data engineering team and other stakeholders of critical data quality changes. The alerting system offers explanations or diagnostics to expedite issue resolution.
User-Friendly Visualizations - Designed to be intuitive for both data engineers and regular users, presenting data quality metrics and insights in an engaging visual format.
Customizable Views- Users can customize their dashboard views, focusing on specific metrics and data quality assets relevant to their roles and responsibilities.
Historical Data Analysis- Provides access to historical data quality trends and metrics to identify patterns and make informed decisions about data quality improvements.
Data Sources Page
Click on the Data button which takes you to "Your current data sources" page where you can find all the data sources currently configured within the tool. The page displays a clean and organized view of data sources which includes the data source name, its state, the size of raw data, and the data quality score with a graphical representation.
Simply click on any data source. A wealth of details pops up on the right side of the page. From the data quality score to essential specifics like its state, the size of raw data, owner details, and vendor information, the interface provides a comprehensive view. It also provides insights into the configuration timestamp, last ingest date, and key configuration specific to the selected ingest method, for a thorough understanding of your data source. Click on View Data Quality Summary to explore additional details related to the selected data source.
Data Quality Page
You are now directed to the "Data Quality" page. This elaborate display showcases a detailed overview of records, items mapped, and data quality score changes per day. It is supplemented by a sankey diagram for a more granular understanding of how the data flows through the DataBee pipelines.
The metrics used in our data quality analysis are listed below, along with their descriptions. All metrics are bound by the time interval selected using the dropdown menu in the top right corner.
Metric | Description |
---|---|
Bytes Ingested | The amount of data ingested by DataBee measured in mB/GB |
Feed Bandwidth | The rate of log processing measured in logs/sec (or MB/sec) |
Owner | The owner specified when the data source was configured |
Records Ingested | The number of individual records ingested by DataBee |
Feed Mapping Efficiency | The percentage of records that were able to be mapped out of the total ingested |
Records Mapped | The number of records that were mapped to an OCSF event or object table |
Data Last Ingested | The last date that the feed was successfully run |
Sankey Diagram
The Sankey diagram in DataBee is a powerful data visualization tool that helps you understand how data flows through different stages. Each flow is depicted as a stream, where the width is proportional to the amount of data it represents. The diagram provides a comprehensive view of data distribution across various categories, allowing for easy identification of successes, failures, and unmapped data.
Data Flow Categories
Success:
The OCSF event tables that are powered by the feed.
Failed:
Regex Errors: Failures due to issues in regular expression matching.
Parsing Errors: Failures that occurred during data parsing.
Mapping Errors: Failures related to data mapping inconsistencies.
Unmapped: Data that has not been assigned to any specific category.
Select the time range (last hour, last day, last 7 days, this month, this year, all history) as per your preference. When you hover over any of the boxes in the diagram, a tooltip will appear showing the percentage of data that the box represents.
Clicking on Success (Process Activity, User Inventory) directs you to a query preloaded Search page, where you can view detailed tables of the corresponding data.
Clicking on Failed (Parsing, Mapping, Regex) takes you to the Unprocessed page. Here, the filters will be preloaded according to the selected time range, allowing you to analyze the specific reasons for failure.
Clicking on Unmapped directs you to a query preloaded search page where you can further investigate the unmapped data.
Unprocessed page
The Unprocessed page provides a detailed table that lists the feed names alongside their corresponding issue type, error message, and the date the issue occurred. This page is designed to help you quickly identify and analyze unprocessed data.
You can access the Unprocessed page in two ways:
From the Sankey Diagram: Click on any of the error boxes (Failed – Mapping, Parsing, Regex) within a data feed's Sankey diagram.
From the Data drop-down on the top navbar
To streamline your analysis, you can apply various filters:
Date Range: Select from predefined options—Last 24 Hours, Last 7 Days, Last Month, or All Time—to focus on a specific timeframe.
Error Type: Filter by the type of issue (Parsing, Regex, Mapping) to narrow down the results to specific errors.
Feed Selection: Choose specific feeds of interest to view only the relevant unprocessed data.
To explore the raw message and analyze where the failure occurred, click on the magnifying glass to expand the row and view the raw message compared to how DataBee tried to process it.