Data Quality

The Data Quality features are designed to help data engineering teams and administrators to assess and maintain the data quality per feed. They offer a range of features to monitor data flows, view alerts for quality changes, and provide insights into data performance. Let's explore the key features.

Key Features

Intelligent Data Quality monitoring: Automatically validates your security data in real-time, detecting data source disruptions, anomalies, and schema mismatches while providing actionable insights to resolve issues quickly.

Data Catalog: Provides a centralized, searchable repository of your security data schemas with complete visibility into data lineage. Quickly trace origins and transformations behind any detection, metric, or compliance report—essential for troubleshooting and validating security findings.

Enhanced onboarding validation checks: Streamlines the onboarding of new data sources with automated verification, pre-flight checks, and integration health monitoring to ensure data quality from day one.

Data Feeds Page

Click on the Data button and then select Data Feeds from the left-hand navigation menu. This takes you to "Your current data feeds" page where you can find all the data feeds currently configured within DataBee. The page displays a clean and organized view of data feeds which includes the data feed name, its state, and the size of raw data. At the top of the page, you can find the total number of configured data feeds and the total bytes ingested. You can search for a specific feed by entering its name in the Search Data Feeds field. A particular category of the Data Feeds can be filtered by selecting options such as Identity, Infrastructure, Training, Endpoint, etc. from the Segments dropdown. Click the Sort by dropdown menu and choose the preferred option to sort the feeds by name, vendor, data usage, state, ingest date or the configuration date. From the time dropdown on the top right corner, you can filter the results by choosing the last hour, day or week’s data.

To add a new data feed, follow the detailed instructions available in the Ingest Methods folder.

Data Feeds can have three states:

Healthy: No active warnings or errors
Warning: At least one active warning but no active errors
Error: At least one active error, regardless of warnings

Simply click on any data feed. A wealth of details pops up on the right side of the page. From the data quality state to essential specifics like the bytes ingested, owner details, and vendor information, the interface provides a comprehensive view. At the top, you can find all active warnings and errors for the feed, along with a detailed message explaining each issue. It also provides insights into the configuration timestamp, ingest type, AWS region, active alerts, etc. for a thorough understanding of your data feed. Data quality score is displayed which is based on the successfully processes messages/ total messages processed over last day. Click on View Data Quality Summary to explore additional details related to the selected data feed.

You can also access the “Data Quality” page by clicking any feed name displayed in the Feed Health widget on the “Console” page.

Data Quality Page

You are now directed to the "Data Quality" page. This elaborate display showcases a detailed overview of records, items mapped, and data quality score changes per day. It is supplemented by a sankey diagram for a more granular understanding of how the data flows through the DataBee pipelines. The ‘Active Alerts’ section displays all active warnings and errors in the data feed.

The metrics used in our data quality analysis are listed below, along with their descriptions. All metrics are bound by the time interval selected using the dropdown menu in the top right corner.

Metric	Description
Bytes Ingested	The amount of data ingested by DataBee measured in mB/GB
Feed Bandwidth	The rate of log processing measured in logs/sec (or MB/sec)
Owner	The owner specified when the data feed was configured
Records Ingested	The number of individual records ingested by DataBee
Feed Mapping Efficiency	The percentage of records that were able to be mapped out of the total ingested
Records Mapped	The number of records that were mapped to an OCSF event or object table
Data Last Ingested	The last date that the feed was successfully run

Feed History

Clicking the history icon on either the Feed side panel or the Feed's Data Quality Summary page displays the feed's history. You can view when the feed was first configured, when warnings were fired and dismissed by users, and when errors occurred. The history popup includes a filter at the top that lets you filter by Warnings, Errors, and Config State changes.

Click on See Details to fetch details about events, API URL, owner name, owner email, authorization details, etc.

Data Quality Alerts

You will receive alert notifications when critical issues arise within your data feeds. With real-time alerts, you can respond quickly to problems, which is essential for maintaining operational efficiency and ensuring security. Errors and warnings can be viewed by clicking on any data feed on “Your current data feeds” page. This will open a side panel where active alerts are displayed. The warnings can be dismissed manually, but they will reappear if the underlying issue has not been resolved. Errors cannot be dismissed and will remain visible until the issue is fixed. Click on the gear icon to access the “Data Quality - Alert Settings” page. You can disable the data staleness check by switching the toggle off or adjust the duration by setting the desired number of hours. Once you’re done, click Save.

Refer to the table below to get a detailed understanding of the errors.

Ingestion Type: S3

Actual error	DataBee status code	DataBee error code	Error Explanation	Resolution Tip
ExpiredToken: The security token included in the request is expired	1400	S3_ExpiredToken	Your authentication token has expired and needs to be refreshed.	Request a new authentication token from your AWS service. Check token expiration handling in your code.
AccessDenied	1403	S3_AccessDenied	You don't have permission to perform this operation on the S3 resource.	Verify IAM roles and permissions. Check bucket policies and ACLs. Ensure your credentials have the required permissions.
BucketNotEmpty	1409	S3_BucketNotEmpty	The bucket must be empty before it can be deleted.	Remove all objects from the bucket first, or use the force delete option if available in your SDK.
InvalidBucketName	1401	S3_InvalidBucketName	The specified bucket name is not valid or follows incorrect naming conventions.	Ensure bucket name follows S3 naming rules: lowercase letters, numbers, dots, and hyphens only. Must be 3-63 characters long.
InvalidObjectState	1402	S3_InvalidObjectState	The requested operation cannot be performed on the object in its current state.	Check if object is in Glacier storage. Verify object lock settings. Ensure object is not being modified by another operation.
NoSuchBucket	1404	S3_NoSuchBucket	The specified bucket does not exist.	Verify bucket name and region. Check if bucket was deleted or never created. Ensure you're using the correct AWS account.
NoSuchKey	1405	S3_NoSuchKey	The specified file or object could not be found in the bucket.	Verify object key path. Check if file was deleted. Ensure correct bucket and folder structure.
PreconditionFailed	1412	S3_PreconditionFailed	One or more preconditions you specified for the operation did not hold.	Check ETag matches and conditional headers. Verify if object was modified since last retrieval.
SlowDown	1503	S3_SlowDown	Please reduce your request rate as you are sending too many requests.	Implement exponential backoff. Add request rate limiting. Consider using S3 Transfer Acceleration for better performance.
UnknownError	1500	S3_UnknownError	An unexpected error occurred while processing your S3 request.	Check AWS service health dashboard. Review CloudWatch logs. Contact AWS support if persistent.

Ingestion Type: SQS

Actual error	DataBee status code	DataBee error code	Error Explanation	Resolution Tips
ExpiredToken: The security token included in the request is expired	2400	SQS_ExpiredToken	Your authentication token for accessing SQS has expired and needs to be renewed.	Request a new authentication token from AWS and update your application's credentials.
AccessDenied	2403	SQS_AccessDenied	You don't have the necessary permissions to perform this operation on the SQS queue.	Check IAM roles and policies, ensure your credentials have the required SQS permissions.
InvalidParameterValue	2401	SQS_InvalidParameterValue	One or more parameters provided in your SQS request have invalid values.	Review API documentation for correct parameter formats and validate all input values.
MissingParameter	2402	SQS_MissingParameter	A required parameter is missing from your SQS request.	Check API documentation for required parameters and ensure all are included in your request.
MessageNotInflight	2404	SQS_MessageNotInflight	The message you're trying to process is not currently in flight or being processed.	Verify message receipt handle is valid and message hasn't exceeded visibility timeout.
OverLimit	2405	SQS_OverLimit	You have exceeded the maximum allowed limit for this SQS operation.	Implement request throttling or contact AWS support to increase your quota limits.
QueueDeletedRecently	2406	SQS_QueueDeletedRecently	You cannot create a queue with this name because it was recently deleted.	Wait 60 seconds before recreating a queue with the same name, or use a different queue name.
NonExistentQueue	2407	SQS_NonExistentQueue	The specified SQS queue does not exist.	Verify queue URL/name and region, ensure queue hasn't been deleted.
InvalidMessageContents	2408	SQS_InvalidMessageContents	The message content contains invalid characters or exceeds size limits.	Check message format and size, ensure it meets SQS message requirements.
UnknownError	2500	SQS_UnknownError	An unexpected error occurred while processing your SQS request.	Check AWS service health, review CloudWatch logs, and contact AWS support if persistent.

Ingestion Type: Azure blob

Actual error	DataBee status code	DataBee error code	Error Explanation	Resolution Tips
InvalidAuthenticationInfo	3400	BLOB_InvalidAuthenticationInfo	The authentication information provided for accessing the Azure Blob storage is invalid or malformed.	Verify your connection string, access keys, or SAS token are correct and not expired.
InvalidBlobOrBlock	3401	BLOB_InvalidBlobOrBlock	The blob or block data you're trying to access or modify is invalid or corrupted.	Check the blob name, size limits, and ensure data integrity during upload/download operations.
InsufficientAccountPermissions	3402	BLOB_InsufficientAccountPermissions	Your account lacks the necessary permissions to perform this operation on the blob storage.	Review and update your Azure role assignments and access policies for the storage account.
AuthorizationFailure	3403	BLOB_AuthorizationFailure	The request was not authorized to perform this operation on the blob resource.	Check your shared access signature (SAS) permissions and storage account access policies.
BlobNotFound	3404	BLOB_BlobNotFound	The requested blob could not be found in the specified container.	Verify the blob name and path, ensure the blob hasn't been deleted or moved.
ContainerNotFound	3405	BLOB_ContainerNotFound	The specified container does not exist in the storage account.	Check the container name and ensure it exists in the correct storage account.
ResourceNotFound	3406	BLOB_ResourceNotFound	The requested Azure Blob storage resource could not be found.	Verify the resource path, name, and ensure the storage account is correctly configured.
BlobAlreadyExists	3407	BLOB_BlobAlreadyExists	A blob with this name already exists in the container.	Use a different blob name or implement logic to handle existing blobs (overwrite/skip).
ContainerAlreadyExists	3408	BLOB_ContainerAlreadyExists	A container with this name already exists in the storage account.	Choose a different container name or handle existing container scenarios appropriately.
InvalidQueryParameterValue	4409	BLOB_InvalidQueryParameterValue	One or more query parameters in your blob storage request are invalid.	Review the API documentation and validate all query parameters meet the required format.
QueueNotFound	4410	BLOB_QueueNotFound	The specified Azure Storage queue could not be found.	Verify the queue name and ensure it exists in the correct storage account.
QueueDisabled	4411	BLOB_QueueDisabled	The queue service is currently disabled for this storage account.	Enable the queue service in your storage account settings or use an alternative storage account.
Unknown	4500	BLOB_UnknownError	An unexpected error occurred while accessing Azure Blob storage.	Check Azure service health, review application logs, and contact Azure support if the issue persists.

Ingestion Type: API

HTTP error codes	Error string	DataBee status code	DataBee error code	Error Explanation	Resolution Tip
Bad Request	Invalid redirection uri	5400	API_InvalidRedirectUrl	The redirect URL provided in your request is not valid or properly formatted.	Check the redirect URL format, ensure it matches the allowed URLs in your API settings, and verify it's properly encoded.
Bad Request	Redirection URI is required	5400	API_NoRedirectUri	The request is missing a required redirect URL parameter.	Add a valid redirect URI to your request parameters as specified in the API documentation.
Bad Request	Invalid Authorization Code	5400	API_InvalidAuthCode	The authorization code provided has expired or is not valid.	Request a new authorization code and ensure you're using it promptly before it expires.
Bad Request	Invalid_refresh_token	5400	API_InvalidRefreshToken	The refresh token provided is not valid or has been revoked.	Initiate a new authentication flow to obtain a fresh refresh token.
Bad Request	Refresh Token expired	5400	API_RefreshTokenExpired	The refresh token has exceeded its lifetime and is no longer valid.	Perform a new authentication flow to obtain new access and refresh tokens.
Unauthorized	unauthorized_client	5400	API_UnauthorizedClient	The client is not authorized to request an authorization code.	Verify your client credentials and ensure your application has the necessary permissions.
Invalid response type	Response type must be	5400	API_InvalidResponseType	The response type specified in the authorization request is not supported.	Use one of the supported response types (usually 'code' or 'token') as specified in the API documentation.
Invalid grand type	invalid grand type	5400	API_UnsupportedGrantType	The grant type specified in the token request is not supported.	Use one of the supported grant types (e.g., 'authorization_code', 'refresh_token') as specified in the API documentation.
Invalid request	Invalid request	5400	API_InvalidRequest	The request is missing a required parameter or contains an invalid parameter value.	Review the API documentation and ensure all required parameters are included with valid values.
UnauthorizedError	Invalid access token	5401	API_InvalidResource	The access token provided is not valid or has been revoked.	Obtain a new access token using your refresh token or perform a new authentication flow.
UnauthorizedError	Access token expired	5401	API_ExpiredAccessToken	The access token has exceeded its lifetime and is no longer valid.	Use your refresh token to obtain a new access token, or perform a new authentication flow if the refresh token is also expired.
UnauthorizedError	Access token not approved	5401	API_AccessTokenNotApproved	The access token has not been approved or was rejected by the authorization server.	Check if the user has granted all required permissions and initiate a new authentication flow if necessary.
ForbiddenError	InsufficientScope	5403	API_InsufficientScope	The access token does not have the required permissions to perform this operation.	Request additional scopes during the authentication process or use a token with the necessary permissions.

Sankey Diagram

The Sankey diagram in DataBee is a powerful data visualization tool that helps you understand how data flows through different stages. Each flow is depicted as a stream, where the width is proportional to the amount of data it represents. The diagram provides a comprehensive view of data distribution across various categories, allowing for easy identification of successes, failures, and unmapped data. To locate the Sankey diagram, click on View Data Quality Summary within the selected data feed from “Your current data feeds“ page or from the Feed Health widget on the "Console" page.

Data Flow Categories

Success:

The OCSF event tables that are powered by the feed.

Failed:

Validation: Failures due to data messiness and data types
Parsing: Failures that occurred during data parsing.
Mapping : Failures related to data mapping inconsistencies.

Unmapped: Data that has not been assigned to any specific category.

Select the time range (last hour, last day, last 7 days, this month, this year, all history) as per your preference. When you hover over any of the boxes in the diagram, a tooltip will appear showing the percentage of data that the box represents.

Clicking on Success (Process Activity, User Inventory) directs you to a query preloaded Search page, where you can view detailed tables of the corresponding data.

Clicking on Failed (Parsing, Mapping, Validation) takes you to the Unprocessed page. Here, the filters will be preloaded according to the selected time range, allowing you to analyze the specific reasons for failure.

Clicking on Unmapped takes you to the Unprocessed page. Here, the filters will be preloaded according to the selected time range, allowing you to analyze the specific unmapped record.

The Unprocessed page provides a detailed table that lists the feed names alongside their corresponding issue type, error message, and the date the issue occurred. This page is designed to help you quickly identify and analyze unprocessed data.

You can access the Unprocessed page in two ways:

From the Sankey Diagram: click on any of the error boxes (Failed – Mapping, Parsing, Regex) within a data feed's Sankey diagram.
Click Data from the top navbar, then select Unprocessed Events from the left-hand navigation menu.

To streamline your analysis, you can apply various filters:

Date Range: select from predefined options such as Last 24 Hours, Last 7 Days, Last Month, or All Time to focus on a specific timeframe.
Error Type: filter by the type of issue (Parsing, Regex, Mapping) to narrow down the results to specific errors.
Feed Selection: choose specific feeds of interest to view only the relevant unprocessed data.

To explore the raw message and analyze where the failure occurred, click on the magnifying glass to expand the row. Click on Compare With Raw Data to view the raw message compared to how DataBee tried to process it.