- Print
- DarkLight
The Data Quality features are designed to help data engineering teams and administrators to assess and maintain the data quality per feed. They offer a range of features to monitor data flows, view alerts for quality changes, and provide insights into data performance. Let's explore the key features.
Key Features
Data Ingest Metrics- Displays a wide range of data ingest metrics such as amount of data ingested, amount of data processed, feed mapping percentage, and more.
Real-Time Monitoring- Continuously monitors data flows and processing pipelines, tracking data flow, ensuring ETL pipeline transparency is consistently maintained.
Alerting Mechanism- Notifies the data engineering team and other stakeholders of critical data quality changes. The alerting system offers explanations or diagnostics to expedite issue resolution.
User-Friendly Visualizations - Designed to be intuitive for both data engineers and regular users, presenting data quality metrics and insights in an engaging visual format.
Customizable Views- Users can customize their dashboard views, focusing on specific metrics and data quality assets relevant to their roles and responsibilities.
Historical Data Analysis- Provides access to historical data quality trends and metrics to identify patterns and make informed decisions about data quality improvements.
Data Sources Page
Click on the Data button which takes you to "Your current data sources" page where you can find all the data sources currently configured within the tool. The page displays a clean and organized view of data sources which includes the data source name, its state, the size of raw data, and the data quality score with a graphical representation.
Simply click on any data source. A wealth of details pops up on the right side of the page. From the data quality state to essential specifics like the bytes ingested, owner details, and vendor information, the interface provides a comprehensive view. It also provides insights into the configuration timestamp, ingest type, AWS region, active alerts, etc. for a thorough understanding of your data source. Click on View Data Quality Summary to explore additional details related to the selected data source.
Data Quality Page
You are now directed to the "Data Quality" page. This elaborate display showcases a detailed overview of records, items mapped, and data quality score changes per day. It is supplemented by a sankey diagram for a more granular understanding of how the data flows through the DataBee pipelines. The ‘Active Alerts’ section displays all active warnings and errors in the data source.
The metrics used in our data quality analysis are listed below, along with their descriptions. All metrics are bound by the time interval selected using the dropdown menu in the top right corner.
Metric | Description |
---|---|
Bytes Ingested | The amount of data ingested by DataBee measured in mB/GB |
Feed Bandwidth | The rate of log processing measured in logs/sec (or MB/sec) |
Owner | The owner specified when the data source was configured |
Records Ingested | The number of individual records ingested by DataBee |
Feed Mapping Efficiency | The percentage of records that were able to be mapped out of the total ingested |
Records Mapped | The number of records that were mapped to an OCSF event or object table |
Data Last Ingested | The last date that the feed was successfully run |
Data Quality Alerts
You will receive alert notifications when critical issues arise within your data sources. With real-time alerts, you can respond quickly to problems, which is essential for maintaining operational efficiency and ensuring security. Errors and warnings can be viewed by clicking on any data source on “Your current data sources” page. This will open a side panel where active alerts are displayed. The warnings can be dismissed manually, but they will reappear if the underlying issue has not been resolved. Errors cannot be dismissed and will remain visible until the issue is fixed. Click on the gear icon to access the “Data Quality - Alert Settings” page. You can disable the data staleness check by switching the toggle off or adjust the duration by setting the desired number of hours. Once you’re done, click Save.
Refer to the table below to get a detailed understanding of the errors.
Ingestion Type: S3
Actual error | DataBee status code | DataBee error code | Error Explanation | Resolution Tip |
---|---|---|---|---|
ExpiredToken: The security token included in the request is expired | 1400 | S3_ExpiredToken | Your authentication token has expired and needs to be refreshed. | Request a new authentication token from your AWS service. Check token expiration handling in your code. |
AccessDenied | 1403 | S3_AccessDenied | You don't have permission to perform this operation on the S3 resource. | Verify IAM roles and permissions. Check bucket policies and ACLs. Ensure your credentials have the required permissions. |
BucketNotEmpty | 1409 | S3_BucketNotEmpty | The bucket must be empty before it can be deleted. | Remove all objects from the bucket first, or use the force delete option if available in your SDK. |
InvalidBucketName | 1401 | S3_InvalidBucketName | The specified bucket name is not valid or follows incorrect naming conventions. | Ensure bucket name follows S3 naming rules: lowercase letters, numbers, dots, and hyphens only. Must be 3-63 characters long. |
InvalidObjectState | 1402 | S3_InvalidObjectState | The requested operation cannot be performed on the object in its current state. | Check if object is in Glacier storage. Verify object lock settings. Ensure object is not being modified by another operation. |
NoSuchBucket | 1404 | S3_NoSuchBucket | The specified bucket does not exist. | Verify bucket name and region. Check if bucket was deleted or never created. Ensure you're using the correct AWS account. |
NoSuchKey | 1405 | S3_NoSuchKey | The specified file or object could not be found in the bucket. | Verify object key path. Check if file was deleted. Ensure correct bucket and folder structure. |
PreconditionFailed | 1412 | S3_PreconditionFailed | One or more preconditions you specified for the operation did not hold. | Check ETag matches and conditional headers. Verify if object was modified since last retrieval. |
SlowDown | 1503 | S3_SlowDown | Please reduce your request rate as you are sending too many requests. | Implement exponential backoff. Add request rate limiting. Consider using S3 Transfer Acceleration for better performance. |
UnknownError | 1500 | S3_UnknownError | An unexpected error occurred while processing your S3 request. | Check AWS service health dashboard. Review CloudWatch logs. Contact AWS support if persistent. |
Ingestion Type: SQS
Actual error | DataBee status code | DataBee error code | Error Explanation | Resolution Tips |
---|---|---|---|---|
ExpiredToken: The security token included in the request is expired | 2400 | SQS_ExpiredToken | Your authentication token for accessing SQS has expired and needs to be renewed. | Request a new authentication token from AWS and update your application's credentials. |
AccessDenied | 2403 | SQS_AccessDenied | You don't have the necessary permissions to perform this operation on the SQS queue. | Check IAM roles and policies, ensure your credentials have the required SQS permissions. |
InvalidParameterValue | 2401 | SQS_InvalidParameterValue | One or more parameters provided in your SQS request have invalid values. | Review API documentation for correct parameter formats and validate all input values. |
MissingParameter | 2402 | SQS_MissingParameter | A required parameter is missing from your SQS request. | Check API documentation for required parameters and ensure all are included in your request. |
MessageNotInflight | 2404 | SQS_MessageNotInflight | The message you're trying to process is not currently in flight or being processed. | Verify message receipt handle is valid and message hasn't exceeded visibility timeout. |
OverLimit | 2405 | SQS_OverLimit | You have exceeded the maximum allowed limit for this SQS operation. | Implement request throttling or contact AWS support to increase your quota limits. |
QueueDeletedRecently | 2406 | SQS_QueueDeletedRecently | You cannot create a queue with this name because it was recently deleted. | Wait 60 seconds before recreating a queue with the same name, or use a different queue name. |
NonExistentQueue | 2407 | SQS_NonExistentQueue | The specified SQS queue does not exist. | Verify queue URL/name and region, ensure queue hasn't been deleted. |
InvalidMessageContents | 2408 | SQS_InvalidMessageContents | The message content contains invalid characters or exceeds size limits. | Check message format and size, ensure it meets SQS message requirements. |
UnknownError | 2500 | SQS_UnknownError | An unexpected error occurred while processing your SQS request. | Check AWS service health, review CloudWatch logs, and contact AWS support if persistent. |
Ingestion Type: Azure blob
Actual error | DataBee status code | DataBee error code | Error Explanation | Resolution Tips |
---|---|---|---|---|
InvalidAuthenticationInfo | 3400 | BLOB_InvalidAuthenticationInfo | The authentication information provided for accessing the Azure Blob storage is invalid or malformed. | Verify your connection string, access keys, or SAS token are correct and not expired. |
InvalidBlobOrBlock | 3401 | BLOB_InvalidBlobOrBlock | The blob or block data you're trying to access or modify is invalid or corrupted. | Check the blob name, size limits, and ensure data integrity during upload/download operations. |
InsufficientAccountPermissions | 3402 | BLOB_InsufficientAccountPermissions | Your account lacks the necessary permissions to perform this operation on the blob storage. | Review and update your Azure role assignments and access policies for the storage account. |
AuthorizationFailure | 3403 | BLOB_AuthorizationFailure | The request was not authorized to perform this operation on the blob resource. | Check your shared access signature (SAS) permissions and storage account access policies. |
BlobNotFound | 3404 | BLOB_BlobNotFound | The requested blob could not be found in the specified container. | Verify the blob name and path, ensure the blob hasn't been deleted or moved. |
ContainerNotFound | 3405 | BLOB_ContainerNotFound | The specified container does not exist in the storage account. | Check the container name and ensure it exists in the correct storage account. |
ResourceNotFound | 3406 | BLOB_ResourceNotFound | The requested Azure Blob storage resource could not be found. | Verify the resource path, name, and ensure the storage account is correctly configured. |
BlobAlreadyExists | 3407 | BLOB_BlobAlreadyExists | A blob with this name already exists in the container. | Use a different blob name or implement logic to handle existing blobs (overwrite/skip). |
ContainerAlreadyExists | 3408 | BLOB_ContainerAlreadyExists | A container with this name already exists in the storage account. | Choose a different container name or handle existing container scenarios appropriately. |
InvalidQueryParameterValue | 4409 | BLOB_InvalidQueryParameterValue | One or more query parameters in your blob storage request are invalid. | Review the API documentation and validate all query parameters meet the required format. |
QueueNotFound | 4410 | BLOB_QueueNotFound | The specified Azure Storage queue could not be found. | Verify the queue name and ensure it exists in the correct storage account. |
QueueDisabled | 4411 | BLOB_QueueDisabled | The queue service is currently disabled for this storage account. | Enable the queue service in your storage account settings or use an alternative storage account. |
Unknown | 4500 | BLOB_UnknownError | An unexpected error occurred while accessing Azure Blob storage. | Check Azure service health, review application logs, and contact Azure support if the issue persists. |
Ingestion Type: API
HTTP error codes | Error string | DataBee status code | DataBee error code | Error Explanation | Resolution Tip |
---|---|---|---|---|---|
Bad Request | Invalid redirection uri | 5400 | API_InvalidRedirectUrl | The redirect URL provided in your request is not valid or properly formatted. | Check the redirect URL format, ensure it matches the allowed URLs in your API settings, and verify it's properly encoded. |
Bad Request | Redirection URI is required | 5400 | API_NoRedirectUri | The request is missing a required redirect URL parameter. | Add a valid redirect URI to your request parameters as specified in the API documentation. |
Bad Request | Invalid Authorization Code | 5400 | API_InvalidAuthCode | The authorization code provided has expired or is not valid. | Request a new authorization code and ensure you're using it promptly before it expires. |
Bad Request | Invalid_refresh_token | 5400 | API_InvalidRefreshToken | The refresh token provided is not valid or has been revoked. | Initiate a new authentication flow to obtain a fresh refresh token. |
Bad Request | Refresh Token expired | 5400 | API_RefreshTokenExpired | The refresh token has exceeded its lifetime and is no longer valid. | Perform a new authentication flow to obtain new access and refresh tokens. |
Unauthorized | unauthorized_client | 5400 | API_UnauthorizedClient | The client is not authorized to request an authorization code. | Verify your client credentials and ensure your application has the necessary permissions. |
Invalid response type | Response type must be | 5400 | API_InvalidResponseType | The response type specified in the authorization request is not supported. | Use one of the supported response types (usually 'code' or 'token') as specified in the API documentation. |
Invalid grand type | invalid grand type | 5400 | API_UnsupportedGrantType | The grant type specified in the token request is not supported. | Use one of the supported grant types (e.g., 'authorization_code', 'refresh_token') as specified in the API documentation. |
Invalid request | Invalid request | 5400 | API_InvalidRequest | The request is missing a required parameter or contains an invalid parameter value. | Review the API documentation and ensure all required parameters are included with valid values. |
UnauthorizedError | Invalid access token | 5401 | API_InvalidResource | The access token provided is not valid or has been revoked. | Obtain a new access token using your refresh token or perform a new authentication flow. |
UnauthorizedError | Access token expired | 5401 | API_ExpiredAccessToken | The access token has exceeded its lifetime and is no longer valid. | Use your refresh token to obtain a new access token, or perform a new authentication flow if the refresh token is also expired. |
UnauthorizedError | Access token not approved | 5401 | API_AccessTokenNotApproved | The access token has not been approved or was rejected by the authorization server. | Check if the user has granted all required permissions and initiate a new authentication flow if necessary. |
ForbiddenError | InsufficientScope | 5403 | API_InsufficientScope | The access token does not have the required permissions to perform this operation. | Request additional scopes during the authentication process or use a token with the necessary permissions. |
Sankey Diagram
The Sankey diagram in DataBee is a powerful data visualization tool that helps you understand how data flows through different stages. Each flow is depicted as a stream, where the width is proportional to the amount of data it represents. The diagram provides a comprehensive view of data distribution across various categories, allowing for easy identification of successes, failures, and unmapped data.
Data Flow Categories
Success:
The OCSF event tables that are powered by the feed.
Failed:
Regex Errors: Failures due to issues in regular expression matching.
Parsing Errors: Failures that occurred during data parsing.
Mapping Errors: Failures related to data mapping inconsistencies.
Unmapped: Data that has not been assigned to any specific category.
Select the time range (last hour, last day, last 7 days, this month, this year, all history) as per your preference. When you hover over any of the boxes in the diagram, a tooltip will appear showing the percentage of data that the box represents.
Clicking on Success (Process Activity, User Inventory) directs you to a query preloaded Search page, where you can view detailed tables of the corresponding data.
Clicking on Failed (Parsing, Mapping, Regex) takes you to the Unprocessed page. Here, the filters will be preloaded according to the selected time range, allowing you to analyze the specific reasons for failure.
Clicking on Unmapped directs you to a query preloaded search page where you can further investigate the unmapped data.
Unprocessed page
The Unprocessed page provides a detailed table that lists the feed names alongside their corresponding issue type, error message, and the date the issue occurred. This page is designed to help you quickly identify and analyze unprocessed data.
You can access the Unprocessed page in two ways:
From the Sankey Diagram: Click on any of the error boxes (Failed – Mapping, Parsing, Regex) within a data feed's Sankey diagram.
From the Data drop-down on the top navbar
To streamline your analysis, you can apply various filters:
Date Range: Select from predefined options—Last 24 Hours, Last 7 Days, Last Month, or All Time—to focus on a specific timeframe.
Error Type: Filter by the type of issue (Parsing, Regex, Mapping) to narrow down the results to specific errors.
Feed Selection: Choose specific feeds of interest to view only the relevant unprocessed data.
To explore the raw message and analyze where the failure occurred, click on the magnifying glass to expand the row and view the raw message compared to how DataBee tried to process it.