- Print
- DarkLight
Azure Blob Storage is a versatile, scalable, and highly available cloud-based object storage service offered by Microsoft Azure. Your data must be stored in an Azure Blob container within your Azure account. Data is then ingested, transformed, and normalized in DataBee before being pushed to your data lake of choice.
Note
You need to have an Azure subscription, as you will be the owner of the storage.
Authentication: For secure authentication, Azure AD integration is required. Start by creating an Azure AD application registration to obtain a client ID and client secret. Configure permissions and the authentication flow for your application.
Authorization: Azure Blob storage supports Role-Based Access Control (RBAC), enabling you to assign roles to users, groups, or service principals at the storage account or container level.
Follow these steps to configure your Azure account.
Create an Azure Storage Account: Set up an Azure storage account where your blobs will be stored, and note the account name and access key for Azure AD integration.
Azure AD App Registration: Register an Azure AD app to acquire the necessary credentials for authentication. This can be done through the Azure portal by navigating to Azure Active Directory > App registrations > New registration.
Assign Azure AD App to a Role:
Storage Blob Data Contributor: Select this role if the application needs permission to delete objects after reading them.
Storage Blob Data Reader: Choose this role for read-only access when deletion is not required.
You can add RBAC for DataBee in Azure AD by clicking Register an Application.
Obtain Authentication Credentials: Retrieve essential credentials like the Client ID, Tenant ID, and Client secret from the Azure portal.
Create Event Subscription
Follow the steps below to create an event subscription in your Azure portal.
Log in to your Azure account and select your storage account.
Navigate to the Events button in the left sidebar and click + Event Subscription to create a new subscription.
You will be directed to the “Create Event Subscription” page. Under the ‘Basics’ section, fill in the details as follows:
Name: enter a meaningful name
Event Schema: select Event Grid Schema from the dropdown menu
Topic Details: the Topic Type and Source Resource are automatically generated. Enter a System Topic Name if one has not been created before
Event Types: select Blob Created
Endpoint Details: select Storage Queue from the dropdown menu
Endpoint: click on Configure an endpoint
On clicking, the “Queues” page is displayed.
Subscription: select your blob subscription name from the dropdown menu
Storage Account: select your blob account name from the dropdown menu
Queue: you can either select an existing queue by choosing Select Existing Queue, choose one from the list of queues displayed, and click on Select. You can create a new queue by selecting Create New Queue. Enter a queue name and click on Create.
Managed Identity for Delivery: Managed Identity type is set to None
Click Create. to complete the setup.
Adding Azure Blob Data Source in DataBee
To configure Azure Blob as a data source in DataBee, follow these steps:
Click on the Data button and select +Add New Data Source in DataBee UI. Choose your preferred data source from the list of available options. You will now be directed to choose your storage account. To fetch data from your Azure Blob storage account, click on Azure Blob.
You will now be redirected to the "Configure data source" page. Follow the flowchart displayed on the right, for a visual guide, outlining the step-by-step configuration process.
Step 1: Connecting a new data source
Enter the data source details in the fields provided.
Data Source Name: a user-friendly name for the data source
Owner Name: the name of the point of contact for the data source
Owner E-mail: email address of the owner
Once you have entered the required information, click Next to proceed to the next step.
Step 2: Grant access for application
Enter Azure authentication details in the fields provided.
Client ID: the Client ID created for your Azure Enterprise application
Client Secret: the generated Enterprise application client secret
Tenant ID: the tenant ID created for your Azure Enterprise application
Click on the link at the bottom of the page, for a detailed guide on creating an enterprise application in your Azure account.
After entering the details, click Next to proceed to the next step in the configuration process.
Step 3: Complete Azure Blob Storage details
Input the Azure Blob Storage details in the fields provided.
Blob Account Name: the name of your Azure Blob Storage account.
Blob Container Name: the Azure Blob container where the blob file is located.
Prefix: full or partial blob path to match the files you want to ingest. The root container path is selected by default.
Delete object in Azure blob ingest on read: enable the checkbox if you want to automatically remove objects from the Azure blob storage after they have been read.
Compression: indicates what type of decompression (if any) should be applied to the objects before reading.
Content Type: indicates how to parse the uncompressed content.
Azure Queue Name: enter the Azure queue name of the event grid which receives the blob event messages.
Find a detailed guide on creating an Azure blob container in your Azure account, by clicking the link at the bottom of the page.
Assign the following roles in the Azure Enterprise application configured in step 2.
Azure blob container as ‘Storage Blob Data Contributor’ if deletion of objects is required; otherwise assign the ‘Storage Blob Data Reader’ role for read-only access.
Queue as Storage Queue Data Contributor.
Click Submit to finalize and complete the configuration process.