Getting Started
  • 25 Oct 2024
  • 7 Minutes to read
  • Contributors
  • Dark
    Light

Getting Started

  • Dark
    Light

Article summary

To begin using DataBee, follow the steps mentioned in the Preliminary Deployment section below.

Preliminary Deployment

Overview

This section will walk you through the preliminary deployment process for setting up your cloud infrastructure. We will cover the necessary actions you need to take, including selecting an AWS region, choosing a subdomain name and email for your local admin account, and the expected next steps to be followed.

Timeline

The expected timeline for this deployment process is approximately one week.

What we need

  • Pick AWS region- When selecting an AWS region for your cloud infrastructure, you can choose from the following options.

Region

Location

us-east-1

North Virginia, US

us-east-2

Ohio, US

us-west-1

Northern California, US

us-west-2

Oregon, US

ap-southeast-2

Sydney, Australia

eu-central-1

Frankfurt, Germany

eu-west-1

Ireland, Europe

  • Subdomain name- any subdomain name of your choice, such as acme.databee.buzz

  • Email- Provide an email address for the initial local admin account. This will be the primary account to which we will send important emails regarding your cloud infrastructure.

  • Customer Name- you can provide the name of your organization.

What you should expect

After the preliminary deployment process is complete, you can expect to receive an email with login credentials and a link to access your instance. This email will be sent to the email address provided for the initial local admin account.

Expected next steps

Setup SSO- you will need the SAML meta file to set up single sign-on for your users.

Snowflake setup- the setup script provided below will help you configure Snowflake for your cloud infrastructure.

Snowflake Direct Connect

Note: Network policies defer between customers and how their architecture is set up. You may need to apply a network policy allowing DataBee to talk to your Snowflake instance. For a list of IP addresses that need to be whitelisted, contact DataBee Support or reach out to your Technical Account Manager.

To learn more about configuring the Snowflake Network Policy, see https://docs.snowflake.com/en/sql-reference/sql/create-network-policy

Key Pair Setup

DataBee connects to Snowflake using a secure key-pair authentication mechanism. Before configuring your Snowflake environment, you will need to generate a key pair for the connection. This 2-step process generates a private encrypted key and a public key. To generate the private key you can run the following openssl command:

openssl genrsa 2048 | openssl pkcs8 -topk8 -v2 des3 -inform PEM -out rsa_key.p8

Private Key Password

This command will ask you to setup a password to protect the private key. DO NOT LOSE this password or you will be unable to complete the Snowflake connection.

To generate the public key that matches the private key created above you can use this openssl command:

openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub

Password Input

This command requests the same password you used to create the private key.

You will use this public key when setting up the user in Snowflake and the private key and password when connecting in the DataBee UI. More details on key-pair authentication with Snowflake can be found here.

Snowflake Setup

Prior to configuring your Snowflake connection in DataBee your Snowflake administrator must perform the steps in the script below:

-- Creating Role
USE ROLE ACCOUNTADMIN;
create ROLE CTSCYBER_RL comment = 'Role created for Comcast to manage the connected app product' ;

USE ROLE ACCOUNTADMIN;
grant OWNERSHIP on ROLE CTSCYBER_RL to ROLE SYSADMIN;

USE ROLE SYSADMIN;
GRANT ROLE CTSCYBER_RL to ROLE SYSADMIN;

-- Creating Warehouse
USE ROLE SYSADMIN;
CREATE OR REPLACE WAREHOUSE CTSCYBER_WH
WITH WAREHOUSE_SIZE = SMALL -- default as XSMALL [| SMALL | MEDIUM | LARGE | XLARGE | XXLARGE | XXXLARGE | X4LARGE | X5LARGE | X6LARGE]
MAX_CLUSTER_COUNT = 2 -- default to 2
MIN_CLUSTER_COUNT = 1 -- default to 1
SCALING_POLICY = STANDARD -- always default to STANDARD [| ECONOMY]
AUTO_SUSPEND = 60 -- warehouses automatically bill for the first minute, so we default to 60 second suspension
AUTO_RESUME = TRUE -- always default to TRUE [| FALSE]
INITIALLY_SUSPENDED = TRUE -- always default to TRUE [| FALSE]
COMMENT = 'This warehouse is utilized by the Comcast DataBee team to load and monitor data in your SF account'
STATEMENT_QUEUED_TIMEOUT_IN_SECONDS = 1800 -- default to 30 min
STATEMENT_TIMEOUT_IN_SECONDS = 3600 -- default to 60 min
;

USE ROLE ACCOUNTADMIN;
GRANT USAGE ON WAREHOUSE CTSCYBER_WH TO ROLE CTSCYBER_RL;

-- Creating necessary database
USE ROLE SYSADMIN;
CREATE DATABASE CTSCYBER_DB
COMMENT = 'Database used for/by Comcast DataBee';

USE ROLE SYSADMIN;
GRANT OWNERSHIP ON DATABASE CTSCYBER_DB TO CTSCYBER_RL COPY CURRENT GRANTS;
GRANT USAGE on database CTSCYBER_DB TO CTSCYBER_RL;

-- Add Users
USE ROLE ACCOUNTADMIN;
CREATE OR REPLACE USER SVC_CTSCYBER
LOGIN_NAME = 'SVC_CTSCYBER'
RSA_PUBLIC_KEY = 'MIIBIjANBgkqh...' -- replace this with the public key you created for this user
DISPLAY_NAME = 'SVC USER CTSCYBER'
FIRST_NAME = ''
LAST_NAME = ''
DEFAULT_WAREHOUSE = 'CTSCYBER_WH' -- default warehouse
DEFAULT_NAMESPACE = 'CTSCYBER_DB' --default database
DEFAULT_ROLE = 'CTSCYBER_RL' --default role
MUST_CHANGE_PASSWORD=FALSE;

-- Grant default role
USE ROLE ACCOUNTADMIN;
GRANT ROLE CTSCYBER_RL TO USER SVC_CTSCYBER;
ALTER USER SVC_CTSCYBER SET DEFAULT_ROLE = CTSCYBER_RL;

-- Grant task permissions
USE ROLE ACCOUNTADMIN; 
GRANT EXECUTE TASK, EXECUTE MANAGED TASK ON ACCOUNT TO ROLE CTSCYBER_RL;

Databricks Initial Setup

Prerequisites

  1. Ensure you have a Databricks account set up.

  2. Make sure Unity Catalog  is enabled on your Databricks workspace.

Step 1: Azure AD/Entra Permissions

To ensure you have the necessary permissions to interact with the Azure-specific resources, you must have the following roles for your Azure subscription, at a minimum:

  • Contributor

  • NetworkManager

Ideally, for more fluent permissions, having the Account Manager and FirewallManager permissions will provide you with even more flexibility surrounding the resources that you deploy for Databricks via Azure.

Adding the Entra ID Service Principal

Follow the steps below to add a service principal to the account using the account console.

  1. Log in to the account console as an account admin

  2. Click on User management

  3. Navigate to the Service principals tab and click Add New

  4. Select “Microsoft Entra ID managed” under Management

  5. Enter a name for the new service principal

  6. Click Add

Assigning the Admin role to the Entra ID service principal

  1. Navigate to “Identity & Access”, select Groups, and select manage groups

  2. Select the “Admin” group

  3. Click on “Add user”, and add the Service Principal that was created above

  4. Click on the 3 dots (located to the right of the service principal name) and add the following role: Service principal: User

Note: The Entra ID MUST be equivalent to the associated Application ID in Azure for the Databricks resources that are created, and can be found in Entra ID or the Azure Portal, such as the blow screenshots:

Entra ID Display:

Azure Portal Display:

Assigning the Entra service principal to identity federated workspaces

Follow the steps below to assign your service principal to identity federated workspaces.

  1. Navigate to the account console sidebar and click Workspaces

  2. Click on your workspace name

  3. Navigate to the Permissions tab and click Add permissions

  4. Search for and select the service principal, assign the permission level(workspace User)

  5. Click Save 

Step 2: Creating a DataBricks Service Principal

Adding Service Principal

Follow the steps below to add a service principal to the account using the account console.

  1. Log in to the account console as an account admin

  2. Click on User management

  3. Navigate to the Service principals tab and click Add service principal

  4. Enter a name for the new service principal

  5. Click Add

  6. Optionally, for access to account-level APIs, navigate to the Roles tab and turn on Account Admin

Assigning service principal to identity federated workspaces

Follow the steps below to assign your service principal to identity federated workspaces.

  1. Navigate to the account console sidebar and click Workspaces

  2. Click on your workspace name

  3. Navigate to the Permissions tab and click Add permissions

  4. Search for and select the service principal, assign the permission level(workspace User)

  5. Click Save 

Step 3: Create an OAuth secret for a service principal

To enable OAuth authentication for Databricks REST APIs, start by creating an OAuth secret. This secret is used to generate OAuth access tokens. Remember, each service principal can have a maximum of five OAuth secrets.

Follow the steps below to create an OAuth secret for a service principal by using the account console

  1. Log in to the account console, as an account admin

  2. Click on User management

  3. Navigate to the Service principals tab and select your service principal

  4. Click Generate secret under OAuth secrets

  5. Copy the displayed Secret and Client ID

  6. Click Done

Note:

The secret will only be revealed once during creation.
The client ID is the same as the service principal’s application ID .

Refer to Authentication using OAuth for service principals for more detailed information.

Step 4: Create a Unity Catalog

Create a Unity Catalog and grant permissions to the service principal you have created.

%sql
CREATE CATALOG IF NOT EXISTS databee;
GRANT ALL PRIVILEGES ON CATALOG databee TO `{service-principal-uuid}`;

If you want to change the service principal as the owner of the catalog, follow the steps below.

  1. Navigate to the Databricks workspace and click Catalog on the left sidebar.

  2. Choose your preferred catalog from the list displayed.

  3. On the right side of the page, click on the edit owner icon.

  4. When the "Set owner" page pops up, add the service principal name in the data field.

  5. Click Save to apply the changes.

To learn more, please refer Set up and manage Unity Catalog.

Step 5: Create an SQL Warehouse

Follow the steps below to create an SQL warehouse.

  1. Click New > SQL Warehouse.

  2. The “New SQL warehouse“ page pops up.

  3. Configure the following parameters. Mentioned below are the preferred values, but you can change them according to your needs.

    • Name: give a suitable name of your choice

    • Cluster size: Small

    • Auto stop: After 10 minutes of inactivity

    • Scaling: minimum 1 and maximum 1 cluster

    • Type: Serverless or Pro

    • Unity Catalog: enable the toggle button

    • Channel: Current

  4. Click Create.

  5. You are directed to the “Manage permissions“ page where you can select users from the dropdown menu to grant them privileges.

  6. Now that the SQL warehouse is created, you are directed to the “Overview” page where the configuration summary is displayed.

  7. Click on the Connection details button and copy the values for Server hostname and HTTP path, which are needed when creating a Databricks connection in DataBee.


Was this article helpful?

Changing your password will log you out immediately. Use the new password to log back in.
First name must have atleast 2 characters. Numbers and special characters are not allowed.
Last name must have atleast 1 characters. Numbers and special characters are not allowed.
Enter a valid email
Enter a valid password
Your profile has been successfully updated.
ESC

Eddy AI, facilitating knowledge discovery through conversational intelligence