- Print
- DarkLight
To begin using DataBee, follow the steps mentioned in the Preliminary Deployment section below.
Preliminary Deployment
Overview
This section will walk you through the preliminary deployment process for setting up your cloud infrastructure. We will cover the necessary actions you need to take, including selecting an AWS region, choosing a subdomain name and email for your local admin account, and the expected next steps to be followed.
Timeline
The expected timeline for this deployment process is approximately one week.
What we need
Pick AWS region- When selecting an AWS region for your cloud infrastructure, you can choose from the following options.
Region | Location |
---|---|
us-east-1 | North Virginia, US |
us-east-2 | Ohio, US |
us-west-1 | Northern California, US |
us-west-2 | Oregon, US |
ap-southeast-2 | Sydney, Australia |
eu-central-1 | Frankfurt, Germany |
eu-west-1 | Ireland, Europe |
Subdomain name- any subdomain name of your choice, such as acme.databee.buzz
Email- Provide an email address for the initial local admin account. This will be the primary account to which we will send important emails regarding your cloud infrastructure.
Customer Name- you can provide the name of your organization.
What you should expect
After the preliminary deployment process is complete, you can expect to receive an email with login credentials and a link to access your instance. This email will be sent to the email address provided for the initial local admin account.
Expected next steps
Setup SSO- you will need the SAML meta file to set up single sign-on for your users.
Snowflake setup- the setup script provided below will help you configure Snowflake for your cloud infrastructure.
Snowflake Direct Connect
Note: Network policies defer between customers and how their architecture is set up. You may need to apply a network policy allowing DataBee to talk to your Snowflake instance. For a list of IP addresses that need to be whitelisted, contact DataBee Support or reach out to your Technical Account Manager.
To learn more about configuring the Snowflake Network Policy, see https://docs.snowflake.com/en/sql-reference/sql/create-network-policy
Key Pair Setup
DataBee connects to Snowflake using a secure key-pair authentication mechanism. Before configuring your Snowflake environment, you will need to generate a key pair for the connection. This 2-step process generates a private encrypted key and a public key. To generate the private key you can run the following openssl
command:
openssl genrsa 2048 | openssl pkcs8 -topk8 -v2 des3 -inform PEM -out rsa_key.p8
Private Key Password
This command will ask you to setup a password to protect the private key. DO NOT LOSE this password or you will be unable to complete the Snowflake connection.
To generate the public key that matches the private key created above you can use this openssl
command:
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub
Password Input
This command requests the same password you used to create the private key.
You will use this public key when setting up the user in Snowflake and the private key and password when connecting in the DataBee UI. More details on key-pair authentication with Snowflake can be found here.
Snowflake Setup
Prior to configuring your Snowflake connection in DataBee your Snowflake administrator must perform the steps in the script below:
-- Creating Role
USE ROLE ACCOUNTADMIN;
create ROLE CTSCYBER_RL comment = 'Role created for Comcast to manage the connected app product' ;
USE ROLE ACCOUNTADMIN;
grant OWNERSHIP on ROLE CTSCYBER_RL to ROLE SYSADMIN;
USE ROLE SYSADMIN;
GRANT ROLE CTSCYBER_RL to ROLE SYSADMIN;
-- Creating Warehouse
USE ROLE SYSADMIN;
CREATE OR REPLACE WAREHOUSE CTSCYBER_WH
WITH WAREHOUSE_SIZE = SMALL -- default as XSMALL [| SMALL | MEDIUM | LARGE | XLARGE | XXLARGE | XXXLARGE | X4LARGE | X5LARGE | X6LARGE]
MAX_CLUSTER_COUNT = 2 -- default to 2
MIN_CLUSTER_COUNT = 1 -- default to 1
SCALING_POLICY = STANDARD -- always default to STANDARD [| ECONOMY]
AUTO_SUSPEND = 60 -- warehouses automatically bill for the first minute, so we default to 60 second suspension
AUTO_RESUME = TRUE -- always default to TRUE [| FALSE]
INITIALLY_SUSPENDED = TRUE -- always default to TRUE [| FALSE]
COMMENT = 'This warehouse is utilized by the Comcast DataBee team to load and monitor data in your SF account'
STATEMENT_QUEUED_TIMEOUT_IN_SECONDS = 1800 -- default to 30 min
STATEMENT_TIMEOUT_IN_SECONDS = 3600 -- default to 60 min
;
USE ROLE ACCOUNTADMIN;
GRANT USAGE ON WAREHOUSE CTSCYBER_WH TO ROLE CTSCYBER_RL;
-- Creating necessary database
USE ROLE SYSADMIN;
CREATE DATABASE CTSCYBER_DB
COMMENT = 'Database used for/by Comcast DataBee';
USE ROLE SYSADMIN;
GRANT OWNERSHIP ON DATABASE CTSCYBER_DB TO CTSCYBER_RL COPY CURRENT GRANTS;
GRANT USAGE on database CTSCYBER_DB TO CTSCYBER_RL;
-- Add Users
USE ROLE ACCOUNTADMIN;
CREATE OR REPLACE USER SVC_CTSCYBER
LOGIN_NAME = 'SVC_CTSCYBER'
RSA_PUBLIC_KEY = 'MIIBIjANBgkqh...' -- replace this with the public key you created for this user
DISPLAY_NAME = 'SVC USER CTSCYBER'
FIRST_NAME = ''
LAST_NAME = ''
DEFAULT_WAREHOUSE = 'CTSCYBER_WH' -- default warehouse
DEFAULT_NAMESPACE = 'CTSCYBER_DB' --default database
DEFAULT_ROLE = 'CTSCYBER_RL' --default role
MUST_CHANGE_PASSWORD=FALSE;
-- Grant default role
USE ROLE ACCOUNTADMIN;
GRANT ROLE CTSCYBER_RL TO USER SVC_CTSCYBER;
ALTER USER SVC_CTSCYBER SET DEFAULT_ROLE = CTSCYBER_RL;
-- Grant task permissions
USE ROLE ACCOUNTADMIN;
GRANT EXECUTE TASK, EXECUTE MANAGED TASK ON ACCOUNT TO ROLE CTSCYBER_RL;
Databricks Initial Setup
Prerequisites
Ensure you have a Databricks account set up.
Make sure Unity Catalog is enabled on your Databricks workspace.
Step 1: Azure AD/Entra Permissions
To ensure you have the necessary permissions to interact with the Azure-specific resources, you must have the following roles for your Azure subscription, at a minimum:
Contributor
NetworkManager
Ideally, for more fluent permissions, having the Account Manager and FirewallManager permissions will provide you with even more flexibility surrounding the resources that you deploy for Databricks via Azure.
Adding the Entra ID Service Principal
Follow the steps below to add a service principal to the account using the account console.
Log in to the account console as an account admin
Click on User management
Navigate to the Service principals tab and click Add New
Select “Microsoft Entra ID managed” under Management
Enter a name for the new service principal
Click Add
Assigning the Admin role to the Entra ID service principal
Navigate to “Identity & Access”, select Groups, and select manage groups
Select the “Admin” group
Click on “Add user”, and add the Service Principal that was created above
Click on the 3 dots (located to the right of the service principal name) and add the following role: Service principal: User
Note: The Entra ID MUST be equivalent to the associated Application ID in Azure for the Databricks resources that are created, and can be found in Entra ID or the Azure Portal, such as the blow screenshots:
Entra ID Display:
Azure Portal Display:
Assigning the Entra service principal to identity federated workspaces
Follow the steps below to assign your service principal to identity federated workspaces.
Navigate to the account console sidebar and click Workspaces
Click on your workspace name
Navigate to the Permissions tab and click Add permissions
Search for and select the service principal, assign the permission level(workspace User)
Click Save
Step 2: Creating a DataBricks Service Principal
Adding Service Principal
Follow the steps below to add a service principal to the account using the account console.
Log in to the account console as an account admin
Click on User management
Navigate to the Service principals tab and click Add service principal
Enter a name for the new service principal
Click Add
Optionally, for access to account-level APIs, navigate to the Roles tab and turn on Account Admin
Assigning service principal to identity federated workspaces
Follow the steps below to assign your service principal to identity federated workspaces.
Navigate to the account console sidebar and click Workspaces
Click on your workspace name
Navigate to the Permissions tab and click Add permissions
Search for and select the service principal, assign the permission level(workspace User)
Click Save
Step 3: Create an OAuth secret for a service principal
To enable OAuth authentication for Databricks REST APIs, start by creating an OAuth secret. This secret is used to generate OAuth access tokens. Remember, each service principal can have a maximum of five OAuth secrets.
Follow the steps below to create an OAuth secret for a service principal by using the account console
Log in to the account console, as an account admin
Click on User management
Navigate to the Service principals tab and select your service principal
Click Generate secret under OAuth secrets
Copy the displayed Secret and Client ID
Click Done
Note:
The secret will only be revealed once during creation.
The client ID is the same as the service principal’s application ID .
Refer to Authentication using OAuth for service principals for more detailed information.
Step 4: Create a Unity Catalog
Create a Unity Catalog and grant permissions to the service principal you have created.
%sql
CREATE CATALOG IF NOT EXISTS databee;
GRANT ALL PRIVILEGES ON CATALOG databee TO `{service-principal-uuid}`;
If you want to change the service principal as the owner of the catalog, follow the steps below.
Navigate to the Databricks workspace and click Catalog on the left sidebar.
Choose your preferred catalog from the list displayed.
On the right side of the page, click on the edit owner icon.
When the "Set owner" page pops up, add the service principal name in the data field.
Click Save to apply the changes.
To learn more, please refer Set up and manage Unity Catalog.
Step 5: Create an SQL Warehouse
Follow the steps below to create an SQL warehouse.
Click New > SQL Warehouse.
The “New SQL warehouse“ page pops up.
Configure the following parameters. Mentioned below are the preferred values, but you can change them according to your needs.
Name: give a suitable name of your choice
Cluster size: Small
Auto stop: After 10 minutes of inactivity
Scaling: minimum 1 and maximum 1 cluster
Type: Serverless or Pro
Unity Catalog: enable the toggle button
Channel: Current
Click Create.
You are directed to the “Manage permissions“ page where you can select users from the dropdown menu to grant them privileges.
Now that the SQL warehouse is created, you are directed to the “Overview” page where the configuration summary is displayed.
Click on the Connection details button and copy the values for Server hostname and HTTP path, which are needed when creating a Databricks connection in DataBee.