Data Quality
  • 11 Sep 2024
  • 4 Minutes to read
  • Contributors
  • Dark
    Light

Data Quality

  • Dark
    Light

Article summary

The Data Quality features are designed to help data engineering teams and administrators to assess and maintain the data quality per feed. They offer a range of features to monitor data flows, view alerts for quality changes, and provide insights into data performance. Let's explore the key features.

Key Features

Data Ingest Metrics- Displays a wide range of data ingest metrics such as amount of data ingested, amount of data processed, feed mapping percentage, and more.

Real-Time Monitoring- Continuously monitors data flows and processing pipelines, tracking data flow, ensuring ETL pipeline transparency is consistently maintained.

Alerting Mechanism- Notifies the data engineering team and other stakeholders of critical data quality changes. The alerting system offers explanations or diagnostics to expedite issue resolution.

User-Friendly Visualizations - Designed to be intuitive for both data engineers and regular users, presenting data quality metrics and insights in an engaging visual format.

Customizable Views- Users can customize their dashboard views, focusing on specific metrics and data quality assets relevant to their roles and responsibilities.

Historical Data Analysis- Provides access to historical data quality trends and metrics to identify patterns and make informed decisions about data quality improvements.

Data Sources Page

Click on the Data button which takes you to "Your current data sources" page where you can find all the data sources currently configured within the tool. The page displays a clean and organized view of data sources which includes the data source name, its state, the size of raw data, and the data quality score with a graphical representation.

data_sources


Simply click on any data source. A wealth of details pops up on the right side of the page. From the data quality score to essential specifics like its state, the size of raw data, owner details, and vendor information, the interface provides a comprehensive view. It also provides insights into the configuration timestamp, last ingest date, and key configuration specific to the selected ingest method, for a thorough understanding of your data source. Click on View Data Quality Summary to explore additional details related to the selected data source.

data_quality_score

Data Quality Page

You are now directed to the "Data Quality" page. This elaborate display showcases a detailed overview of records, items mapped, and data quality score changes per day. It is supplemented by a sankey diagram for a more granular understanding of how the data flows through the DataBee pipelines.

The metrics used in our data quality analysis are listed below, along with their descriptions. All metrics are bound by the time interval selected using the dropdown menu in the top right corner.

Metric 

Description

Bytes Ingested

The amount of data ingested by DataBee measured in mB/GB

Feed Bandwidth

The rate of log processing measured in logs/sec (or MB/sec)

Owner

The owner specified when the data source was configured

Records Ingested

The number of individual records ingested by DataBee

Feed Mapping Efficiency

The percentage of records that were able to be mapped out of the total ingested

Records Mapped

The number of records that were mapped to an OCSF event or object table

Data Last Ingested

The last date that the feed was successfully run

Sankey Diagram

The Sankey diagram in DataBee is a powerful data visualization tool that helps you understand how data flows through different stages. Each flow is depicted as a stream, where the width is proportional to the amount of data it represents. The diagram provides a comprehensive view of data distribution across various categories, allowing for easy identification of successes, failures, and unmapped data.

Data Flow Categories

Success:

  • The OCSF event tables that are powered by the feed.

Failed:

  • Regex Errors: Failures due to issues in regular expression matching.

  • Parsing Errors: Failures that occurred during data parsing.

  • Mapping Errors: Failures related to data mapping inconsistencies.

Unmapped: Data that has not been assigned to any specific category.

Select the time range (last hour, last day, last 7 days, this month, this year, all history) as per your preference. When you hover over any of the boxes in the diagram, a tooltip will appear showing the percentage of data that the box represents.

Clicking on Success (Process Activity, User Inventory) directs you to a query preloaded Search page, where you can view detailed tables of the corresponding data.

Clicking on Failed (Parsing, Mapping, Regex) takes you to the Unprocessed page. Here, the filters will be preloaded according to the selected time range, allowing you to analyze the specific reasons for failure.

Clicking on Unmapped directs you to a query preloaded search page where you can further investigate the unmapped data.

Unprocessed page

The Unprocessed page provides a detailed table that lists the feed names alongside their corresponding issue type, error message, and the date the issue occurred. This page is designed to help you quickly identify and analyze unprocessed data.

You can access the Unprocessed page in two ways:

  • From the Sankey Diagram: Click on any of the error boxes (Failed – Mapping, Parsing, Regex) within a data feed's Sankey diagram.

  • From the Data drop-down on the top navbar

To streamline your analysis, you can apply various filters:

  • Date Range: Select from predefined options—Last 24 Hours, Last 7 Days, Last Month, or All Time—to focus on a specific timeframe.

  • Error Type: Filter by the type of issue (Parsing, Regex, Mapping) to narrow down the results to specific errors.

  • Feed Selection: Choose specific feeds of interest to view only the relevant unprocessed data.

To explore the raw message and analyze where the failure occurred, click on the magnifying glass to expand the row and view the raw message compared to how DataBee tried to process it.


Was this article helpful?

Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.
ESC

Eddy AI, facilitating knowledge discovery through conversational intelligence