- Print
- DarkLight
The Data Collector collects data from diverse sources, applies filters, and adds metadata before securely forwarding it to the DataBee Receiver. It is easily installable on your On-Prem machines. DataBee is a robust centralized platform that tracks multiple Data Collectors. It processes, enriches, and securely stores the incoming data, allowing remote Data Collector configuration and updates.
Syslog stands as a widely accepted protocol employed for message logging within computing systems. TCP logs refer to logs generated by applications or systems that communicate using the Transmission Control Protocol (TCP). Windows Event Logs are a built-in logging mechanism in Windows operating systems. The critical event messages generated by various devices and applications are transmitted to a centralized server. These messages often contain pivotal information about system events, errors, warnings, and operational status, crucial for system analysis and troubleshooting.
In the realm of efficient data management, an on-premises forwarder plays a pivotal role. Acting as an intermediary, the forwarder collects the log messages from diverse sources spread across a network infrastructure. It subsequently forwards them securely and accurately to a central platform DataBee. The forwarder's function lies at the heart of streamlining the transmission of error-free log messages, ensuring the integrity and efficiency of data ingestion and analysis. The data would be filtered and tagged via DataBee receiver services and forwarded to the platform based on the tenant and data source identifier.
High-Level Features
Efficient Data Collection: Users can configure the collector(s) on the DataBee Platform and install it on their On-Prem machines. Collector(s) gathers diverse data, applies filters, and adds metadata before sending it to DataBee Receiver.
Remote Configuration: Users can remotely manage and configure collector(s) on the DataBee Platform.
Centralized Receiver: DataBee Receiver receives and securely stores data received from multiple Collectors.
Compliance and Security: Ensures adherence to compliance standards and robust security measures.
Reliability and Monitoring: Offers high reliability such as handling intermittent network issues along with monitoring using Datadog dashboards.
This guide will walk you through the step-by-step procedure to set up and establish a robust link between your on-premises system and DataBee.
Understanding Terminologies
Fluent Bit
High-performance On-Prem Collector for logs, metrics, and traces, emphasizing lightweight operation and minimal memory usage.
Configuration Adapter
This service acts as a bridge for Fluent-bit configuration management, retrieving the latest configurations from the platform, modifying them for Fluent-bit compatibility, and handling acknowledgment back to the platform.
System Monitor
This service periodically checks the collector's health, gathering metrics like CPU usage, records processed per source, storage details, and uptime. The collected data is then transmitted to the DataBee platform's monitoring endpoint.
Encryption and Security Best Practices
All the network communication occurs over a secure channel. This ensures that any communication between the services and external systems or services is done using secure protocols, such as HTTPS. For example, the data sent by the collector and collector services communicating with the DataBee platform is encrypted by TLS.
No sensitive data is stored in the logs.
Getting Started
Prerequisites
The host should be reachable to the DataBee platform.
Root / Administrator privileges on the system where the data collector is to be installed.
In the case of Windows systems, Powershell 7.3 (Minimum) is required.
System Requirements
Recommended System Resources
Memory | CPU | Disk |
---|---|---|
4GB | 4 | Available space 10 GB |
Note
The storage buffer size for the data collector is configured to a limit of 4GB. In case of network disruptions, the data collector will accumulate the latest data up to 4GB in the configured file storage. Upon resolution of the network disruptions, it will resume transmitting the buffered data from the file storage.
Supported Platforms
OS | Version | Architecture |
---|---|---|
Ubuntu | 22.04 LTS (Jammy Jellyfish) | amd64 (x86_64), arm64 |
RHEL | 8.8 | amd64 (x86_64), arm64 |
Windows Server | WS 2022 LTSC (Standard Edition) | x86_64 (64 bit) |
Note
The collector can work on other Ubuntu/RHEL versions as well. But, it will give a warning(s) as “OS <current os> is not officially supported. Hence, this might impact installation and cause issues”. Therefore it’s not recommended.
For RHEL operating system, make sure to subscribe using:
subscription-manager register --username <username> --password <password> --auto-attach
The data collector supports a maximum of the below-mentioned EPS, considering an average event size with the recommended system resources.
Log Source | EPS | Average Message Size |
---|---|---|
Syslog | ~16K | 1KB |
TCP | ~38K | 250B |
Windows | ~1.6K | 600B (1 Windows Channel) |
Flat File | ~18K | ~1KB (with 10 Files, total 5GB of static data) |
Note
Flat File data source supports up to 5 GB of static data.
SSD is recommended for optimal performance.
CPU usage will increase in correlation with the number of files to be monitored, and the total data size. Please plan your system resources accordingly.
Configure Data Collector in DataBee
To configure your data collector in DataBee, follow these steps.
Click on the settings icon at the top right corner of the UI. From the dropdown menu, select System.
From the left sidebar, select Data Collectors. The page displays all the data collectors configured until now. To create a new data collector, scroll to the bottom of the page and click on Add Data Collector.
To set up your data collector, follow the flowchart displayed on the right, for a visual guide. It outlines the step-by-step configuration process.
Step 1: Basic Information
Enter the data collector details in the fields provided.
Collector Name: a name for your data collector
OS: the operating system used, such as Linux or Windows
If you wish to enable the proxy functionality, check the Enable Proxy checkbox.
Proxy URL: HTTP URL or IP used while connecting to DataBee platform
Proxy Username: the proxy username to be used for authentication
Password: the password corresponding to the proxy username
Note
When configuring a proxy, ensure its accuracy. After adding the proxy, the data collector will incorporate these changes automatically and proceed with subsequent calls through the specified proxy. If the proxy malfunctions, the data collector may not function correctly. Therefore, the only way to modify the proxy is through manual updates to the on-premise collector configuration. In case you are changing the proxy details, the previous set of values needs to be valid so the collector can fetch the new changes.
Click Next to proceed to the next step.
Step 2: Installation Steps
Copy the installation command using Copy to clipboard. Execute the command on your host machine terminal where you will be prompted to enter details like Tenant ID, Collector ID, Receiver URL, etc. Copy them by clicking on the Copy to clipboard button. You can view the generated API key by clicking Show API key, and then copy it using Copy to clipboard.
Tenant ID: Unique ID of the tenant
Receiver URL: DataBee endpoint to forward the collected data to (Only HTTPS URL is supported)
Collector ID: Unique ID of the collector
API Key: API key to authenticate to DataBee Platform
If you have completed copying the information, click Close.
To manage your configured data collectors, follow these steps:
Navigate to the “Data Collectors” page. Locate the specific data collector you want to modify, and click on it. You can edit the basic information of the selected data collector.
To disable the data collector connection, simply click on Disable. If you wish to remove the data collector, click on Delete. Make any necessary changes and then click Update to save your modifications.
Click on Installation Steps to view the installation command, Tenant ID, Receiver URL, Collector ID, and API key.
Click on Data Sources to view all the data sources relying on the selected data collector to ingest data.
Installing Data Collector in your system
Log Type | Supported Collector Versions |
---|---|
Syslog | 0.2-20 and later |
TCP | 0.3-x and later |
Windows Event | 0.4-x and later |
Flat File | 0.5-x |
To install the data collector along with the required dependencies and packages, follow the below steps:
Linux
Run the copied installation command from DataBee platform on the terminal. On successful installation, you will see the following message on the terminal: Installation completed successfully.
Windows
Run the copied installation command from DataBee platform on your PowerShell terminal as the Administrator.
The image below shows sample collector configuration details provided during the installation (Windows):
On successful installation, you will see the following message on the terminal: Installation completed successfully.
TLS/SSL Support
You will be prompted to choose the default Distinguished Name (DN) parameters (displayed on the console) or manually provide the Distinguished Name (DN).
Linux
Generating certificates required for data sources using TLS support...
NOTE: Self-signed certificates will be generated with following default fields under /opt/comcast-databee-collector/certs directory.
Country Name: US
State or Province Name: Colorado
Locality Name: Centennial
Organization Name: Comcast
Organizational Unit Name: IT
Common Name: ub22-50-2-121
Email Address: test@gmail.com
Would you like to generate certificates with above default fields? If no, enter 'n' to provide custom values for certificate fields. (y/N):
Windows
Generating certificates required for data sources using TLS support...
NOTE: Self-signed certificates will be generated with following default fields under C:\Program Files\Comcast Databee Collector\certs directory.
Country Name : US
State or Province Name : Colorado
Locality Name : Centennial
Organization Name : Comcast
Organizational Unit Name : IT
Common Name : WIN-62M37L27NDE
Email Address : test@gmail.com
Would you like to generate certificates with above default fields? If no, enter 'n' to provide custom values for certificate fields. (y/n):
If you want to continue with the default parameters, press y. It will generate self-signed certificates with the above-mentioned Distinguished Name (DN) parameters. Upon successful generation of the certificate, the console will show the status mentioned below.
If you want to manually enter the Distinguished Name (DN) parameters, then you can give relevant values for all parameters.
Note:
While giving the DN parameters, you must be aware that a distinct Common Name should be provided for both the CA and Server Certificate. For e.g., if CN for the CA certificate is comcast.com, then CN for the server certificate can be test.comcast.com.
After the installation is complete, the default self-signed certificates will be generated at the location mentioned below.
Windows: C:\Program Files\Comcast Databee Collector\certs\
Linux: /opt/comcast-databee-collector/certs/
Note:
If you encounter any issues during the installation process, the script might exit with an error. In such scenarios, when you attempt to install again, you will be given a choice to resume the installation from the previously failed attempt with the following message:
Do you want to resume installation from the previously failed attempt? If not, any previous installation progress will be wiped out and installation will be restarted?
If you provide ‘y’, the installation will resume from the previously failed attempt.
If you provide ‘n’, data from the previously failed attempt will be wiped off and a fresh installation will begin.
Once the installation is completed successfully, the collector.yaml file is updated as per the user-provided details.
Users can configure other parameters in the collector.yaml (under /opt/comcast-databee-collector/conf in case of Linux and C:\Program Files\Comcast Databee Collector\conf in case of Windows) such as polling interval, logging related parameters, etc. as mentioned below:
Parameter Name | Type | Description | Sample value |
---|---|---|---|
configadapter.conf-polling-interval | Integer | Polling interval in seconds for configuration updates. | 60 |
monitor.metric-push-interval | Integer | Interval in seconds for pushing metrics. | 60 |
fluentbit.flush | Integer | Time in seconds for Fluent Bit to flush records. | 5 |
fluentbit.log-level | String | Logging level for Fluent Bit. Options: off, error, warn, info, debug, trace | info |
fluentbit.port | Integer | Port number for Fluent Bit. | 2020 |
global.api-key | String | API key for authentication. | f02a2228-ed5f-40db-b4d1-e71bfa2aa542 |
global.collector-id | String | Identifier for the data collector. | 5d2af5e7-d9cd-4f59-bfce-47f08b6d340c |
global.tenant-id | String | Identifier for the tenant. | testtenant |
global.receiver-url | String | URL of receiver endpoint | |
log.encoding | String | Encoding format for log messages (e.g., console). | console |
log.level | String | Log level for application logging (e.g., INFO). | INFO(WARN, ERROR, DEBUG) |
log.rotator.maxSize | Integer | Max size in MB before the log is rotated | 100 |
log.rotator.maxBackups | Integer | Max number of old log files to keep | 10 |
log.rotator.maxAge | Integer | Max age in days to retain log files | 10 |
log.rotator.compress | Boolean | Compress/zip old log files | true |
Sample Collector YAML
configadapter:
conf-polling-interval: 60
monitor:
metric-push-interval: 60
fluentbit:
flush: 5
log-level: info
port: 2020
global:
api-key: f02a2228-ed5f-40db-b4d1-e71bfa2aa542
collector-id: 5d2af5e7-d9cd-4f59-bfce-47f08b6d340c
tenant-id: testtenant
receiver-url: https://testhost.com
log:
encoding: console
level: INFO
rotator:
maxSize: 100
maxBackups: 10
maxAge: 10
compress: true
Configure Data Feed with Data Collector
To configure your data feed with your data collector, follow the steps below.
Click on the Data button and select +Add New Data Source in DataBee UI. Choose your preferred data source from the list of available options. You will now be directed to choose your ingest method. To fetch data from your on-prem data collector, click on Data Collector.
You will now be redirected to the "Configure data source" page. Follow the flowchart displayed on the right, for a visual guide, outlining the step-by-step configuration process.
Step 1: Configure Data Source
Enter the data source details in the fields provided and choose a pre-configured data collector of your choice.
Data Source Name: a user-friendly name for the data source
Owner Name: the name of the point of contact for the data source
Owner E-mail: email address of the owner
Collector: list of active data collectors available
Once you have entered the required information, click Next to proceed to the next step.
Step 2: Configure Inputs
Please enter the required data in the input fields provided below.
Syslog
Log Source: the type of log source. Select Syslog while configuring the syslog input
Format: the incoming Syslog data format, e.g., [syslog-rfc5424/syslog-rfc3164]
Mode: the server's communication protocol, UDP or TCP
Port: the listening TCP/UDP port used for receiving syslog data
Tags: the tag value(s) to be appended to the log to help identify the source log. It follows a key-value pair, and you can add multiple tags
TCP
Log source: the type of log source. Select TCP while configuring the TCP input
Format: the incoming data format for e.g., cef, leef, json, other
Port: the listening TCP port used for receiving data. This port must be opened up on the collector VM manually by the user
Enable TLS: enable the toggle for secure TCP communication (optional)
Tags: the tag value(s) to be appended to the log to help identify the source log. It follows a key-value pair, and you can add multiple tags
When the Enable TLS toggle button is enabled, the field for server certificate, server key, and CA certificate will be displayed. These fields will be auto-populated with the default certificate/key path based on the Data Collector OS. You can replace these certificates/key paths if you want to provide your own TLS certificates.
Server Certificate Path
The default server certificate path will be auto-populated in the UI. If you want to configure your own certificate, you can give a server certificate path in this field. The default path will be as mentioned below.
Windows: C:\Program Files\Comcast Databee Collector\certs\server-cert.pem
Linux: /opt/comcast-databee-collector/certs/server-cert.pem
Server Private Key Path
The default server private key path will be auto-populated in the UI. If you want to configure your own certificate, you can give a server private key path in this field. The default path will be as mentioned below.
Windows: C:\Program Files\Comcast Databee Collector\certs\server-key.pem
Linux: /opt/comcast-databee-collector/certs/server-key.pem
CA Certificate Path
The default CA certificate path will be auto-populated in the UI. If you want to configure your own certificate, you can give a CA certificate path in this field. The default path will be as mentioned below.
Windows: C:\Program Files\Comcast Databee Collector\certs\ca-cert.pem
Linux: /opt/comcast-databee-collector/certs/ca-cert.pem
For TLS communication the below mentioned algorithm is supported for the CA and Server Certificate:
EC with pkeyopt = ec_paramgen_curve:prime256v1
EC with pkeyopt = ec_paramgen_curve:secp521r1
RSA
Ed25519
SHA 256/384/512
The certificates with *.pem and *.crt are supported with the above-mentioned algorithm.
After entering the details, click Next to proceed to the next step in the configuration process.
Windows Event
For Windows Events Collection, please refer to the following guide before installing the data collector: Deployment Guide for Windows Events Collection
Log Source: the type of log source. Select Windows Event while configuring the syslog input
Refresh Interval (seconds): the polling interval for each specified channel (in seconds). The default value is 1 second to achieve optimum performance in terms of EPS. The available options are 1, 5, 10, and 20
Channels: names of channels from which the data collector will be fetching events. (Only administrative and operational types of channels are supported.)
Steps to fetch Channel name from Event Viewer:
Login to your Windows machine and open Event Viewer.
Right-click on the channel and click on Properties.
Copy the value from the field 'Full Name' and paste it on the channels dropdown on DataBee UI.
Read Historical Event: Enable the Read Historical Events checkbox in case all the existing events are required to be collected. By default, this is disabled.
Enabling this option might result in duplicate data ingestion on the platform.
Note: Historical event collection from forwarder machines to the central collector machines (domain controller) is not configurable from the DataBee UI.
Query: Optionally, you can provide the Query (in XPath or XML format) to filter events based on Event ID, time range, etc.
You can directly copy the XML query from the Event Viewer using the following steps on the Windows machine:
Open the Channel in the Event Viewer.
On the right-hand side, under the Actions pane, click on Filter Current Log….
You can choose the relevant filters and then click on the XML tab.
Copy the query and paste it into the ‘Query’ field on the Databee UI.
Tags: the tag value(s) to be appended to the log to help identify the source log. It follows a key-value pair, and you can add multiple tags
Flat File
Log Source: it defines the type of log source. Select Flat File
Refresh Interval (minutes): the interval (in minutes) of refreshing the list of watched files. The default value is 1. The available options are 1, 5, and 10.
Source Files: a list of source files to be monitored. Accepts wildcard patterns.
Examples:
/dc/logs/t?/*.log pattern will monitor all the .log files inside the directory starting with ‘t’ with one additional character.
/var/log/*/*.log pattern will monitor all files with the .log extension under /var/log and its subdirectory (up to 1 nested level).
The data collector keeps track of monitored files and offsets.
Note
The data collector does not support Multiline reading from file(s). The data collector reads every matched file in the Source Files pattern and for every new line found, i.e. separated by a newline character (\n), it ingests an event. Hence, JSON text must be contained in a single row for proper ingestion. The entire JSON body format is not supported.
File rotation is properly handled. Note that the paths provided to the Source Files field cannot match the rotated files. Otherwise, the rotated file would be read again and lead to duplicate records. Hence, it is recommended to configure the Exclusion Files accordingly to avoid this.
If the data contained in a line exceeds 512k, the file will be skipped from the monitoring list and hence its data will not be ingested.
Exclusion Files: a list of files to be excluded. Accepts wildcard patterns. For example /*.gz or /*.zip.
Tags: the tag value(s) to be appended to the log to help identify the source log. It follows a key-value pair, and you can add multiple tags
Remote File Log Collection
The data collector does not natively support this. To facilitate this process, you should transfer log files from remote systems to the data collector's host machine.
Please follow these preliminary steps before setting up your data source:
Refer to the Mounting Guide: Remote Log Files Collection on Data Collector for detailed instructions on attaching an external drive to the data collector.
Ensure that the chosen disk has adequate capacity for your log volume needs. For example, select a 1TB drive if you anticipate storing logs from remote machines of that volume.
You can send the logs from remote machines to the newly mounted storage drive on the data collector.
Step 3: Configure Filters
You have the option to filter data based on specific keywords, either through inclusion or exclusion. Here's how you can set it up:
Inclusion Filter: from the Filters dropdown list, choose Inclusion. Input the filter value. The collector will include only those records whose message key contains the specified keyword you have entered.
Exclusion Filter: from the Filters dropdown list, opt for the filter type Exclusion. Input the filter value. The collector will exclude records with message keys containing the specified keyword you have entered.
Multiple Filters: When more than two filters are present, the AND condition applies between Inclusion filters; the OR condition applies between the Exclusion filters.
Note:
Filters will only be applied to the 'message' key in the syslog messages.
To delete the inclusion/exclusion filters, click on the trash icon.
Click Submit to finalize and complete the configuration process.
Management of services
The management script helps you manage all the collector-related services, i.e., start, stop, and view the services' status. Use the following commands to manage the services:
Linux
Start the services:
/opt/comcast-databee-collector/collector.sh start
Stop the services:
/opt/comcast-databee-collector/collector.sh stop
Check the status of the services:
/opt/comcast-databee-collector/collector.sh status
Print the collector version:
/opt/comcast-databee-collector/collector.sh version
Generate the self-signed certificates to enable TLS support:
This is only supported by 0.3-x and later collector versions.
/opt/comcast-databee-collector/collector.sh generate_certs
When you run this command and the default certificates are not expired, it will prompt whether you still want to generate self-signed certificates or not. If you press ‘N’, it will not generate new certificates.
If you provide ‘y’, it will ask whether to use default Distinguished Name (DN) parameters. You have to follow the same steps mentioned above in the installation section.
Windows
Change the current directory:
cd "C:\Program Files\Comcast Databee Collector"
Start the services:
.\collector.ps1 start
Stop the services:
.\collector.ps1 stop
Check the status of the services:
.\collector.ps1 status
Print the collector version:
.\collector.ps1 version
Generate the self-signed certificates to enable TLS support:
This is only supported by collector versions > 0.2-20-8601dc8.
.\collector.ps1 generatecerts
When you run this command and the default certificates are not expired, it will prompt whether you still want to generate self-signed certificates or not. If you press ‘N’, it will not generate new certificates.
If you press y, it will ask whether to use default Distinguished Name (DN) parameters. You have to follow the same steps mentioned above in the installation section.
Upgrade
Linux
To upgrade your data collector, open the terminal and use the command that you have copied from the DataBee platform. Refer the sample command below.
Sample command:
bash -c "$(curl -L https://artifacts.us-east-1.databee.buzz/data-collector/HEAD/upgrade.sh)"
After the upgrade, verify the latest version using the command below.
/opt/comcast-databee-collector/collector.sh version
When the data collector is upgraded from Data Collector version 0.2-20-8601dc8, the script will prompt you for the certificate generation. The user has to follow the steps mentioned under TLS/SSL Support section.
Windows
Open your PowerShell terminal as Administrator. Use the command that you have copied from DataBee platform and refer the sample command below, to upgrade your data collector.
Sample command:
Invoke-WebRequest -Uri “https://artifacts.us-east-1.databee.buzz/data-collector/HEAD/upgrade.ps1” -OutFile "upgrade.ps1" && .\upgrade.ps1
After the upgrade, verify the latest version using the command below.
. "C:\Program Files\Comcast Databee Collector\collector.ps1" version
When the data collector is upgraded from Data Collector version 0.2-20-8601dc8, the script will prompt you for the certificate generation. The user has to follow the steps mentioned under TLS/SSL Support section.
Uninstallation
Follow the steps below to clean up the installation directory, and logs, and to stop and uninstall all the collector services.
Linux
Make sure you are the root user.
Open the terminal.
Grant executable permissions to the uninstaller, if required.
chmod +x /opt/comcast-databee-collector/uninstall.sh
Run the command below to uninstall the collector:
/opt/comcast-databee-collector/uninstall.sh
Windows
To uninstall the collector, run the command below on PowerShell as Administrator.
. "C:\Program Files\Comcast Databee Collector\uninstall.ps1"
Note:
Make sure you’re not present on the C:\Program Files\Comcast DataBee Collector path while running this command. Otherwise, PowerShell will interpret that the installation directory is in use and not remove the directory.