LogoLogo
  • 🦩Overview
  • 💾Datasets
    • Overview
    • Core Concepts
      • Columns & Annotations
      • Type & Property Mappings
      • Relationships
    • Basic Datasets
      • dbt Integration
      • Sigma Integration
      • Looker Integration
    • SaaS Datasets
    • CSV Datasets
    • Streaming Datasets
    • Entity Resolution
    • AI Columns
      • AI Prompts Recipe Book
    • Enrichment Columns
      • Quick Start
      • HTTP Request Enrichments
    • Computed Columns
    • Version Control
  • 📫Syncs
    • Overview
    • Triggering & Scheduling
    • Retry Handling
    • Live Syncs
    • Audience Syncs
    • Observability
      • Current Sync Run Overview
      • Sync History
      • Sync Tracking
      • API Inspector
      • Sync Alerts
      • Observability Lake
      • Datadog Integration
      • Warehouse Writeback
      • Sync Lifecycle Webhooks
      • Sync Dry Runs
    • Structuring Data
      • Liquid Templates
      • Event Syncs
      • Arrays and Nested Objects
  • 👥Audience Hub
    • Overview
    • Creating Segments
      • Segment Priorities
      • Warehouse-Managed Audiences
    • Experiments and Analysis
      • Audience Match Rates
    • Activating Segments
    • Calculated Columns
    • Data Preparation
      • Profile Explorer
      • Exclusion Lists
  • 🧮Data Sources
    • Overview
    • Available Sources
      • Amazon Athena
      • Amazon Redshift
      • Amazon S3
      • Azure Synapse
      • ClickHouse
      • Confluent Cloud
      • Databricks
      • Elasticsearch
      • Kafka
      • Google AlloyDB
      • Google BigQuery
      • Google Cloud SQL for PostgreSQL
      • Google Pub/Sub
      • Google Sheets
      • Greenplum
      • HTTP Request
      • HubSpot
      • Materialize
      • Microsoft Fabric
      • MotherDuck
      • MySQL
      • PostgreSQL
      • Rockset
      • Salesforce
      • SingleStore
      • Snowflake
      • SQL Server
      • Trino
  • 🛫Destinations
    • Overview
    • Available Destinations
      • Accredible
      • ActiveCampaign
      • Adobe Target
      • Aha
      • Airship
      • Airtable
      • Algolia
      • Amazon Ads DSP (AMC)
      • Amazon DynamoDB
      • Amazon EventBridge
      • Amazon Pinpoint
      • Amazon Redshift
      • Amazon S3
      • Amplitude
      • Anaplan
      • Antavo
      • Appcues
      • Apollo
      • Asana
      • AskNicely
      • Attentive
      • Attio
      • Autopilot Journeys
      • Azure Blob Storage
      • Box
      • Bloomreach
      • Blackhawk
      • Braze
      • Brevo (formerly Sendinblue)
      • Campaign Monitor
      • Canny
      • Channable
      • Chargebee
      • Chargify
      • ChartMogul
      • ChatGPT Retrieval Plugin
      • Chattermill
      • ChurnZero
      • CJ Affiliate
      • CleverTap
      • ClickUp
      • Constant Contact
      • Courier
      • Criteo
      • Crowd.dev
      • Customer.io
      • Databricks
      • Delighted
      • Discord
      • Drift
      • Drip
      • Eagle Eye
      • Emarsys
      • Enterpret
      • Elasticsearch
      • Facebook Ads
      • Facebook Product Catalog
      • Freshdesk
      • Freshsales
      • Front
      • FullStory
      • Gainsight
      • GitHub
      • GitLab
      • Gladly
      • Google Ads
        • Customer Match Lists (Audiences)
        • Offline Conversions
      • Google AlloyDB
      • Google Analytics 4
      • Google BigQuery
      • Google Campaign Manager 360
      • Google Cloud Storage
      • Google Datastore
      • Google Display & Video 360
      • Google Drive
      • Google Search Ads 360
      • Google Sheets
      • Heap.io
      • Help Scout
      • HTTP Request
      • HubSpot
      • Impact
      • Insider
      • Insightly
      • Intercom
      • Iterable
      • Jira
      • Kafka
      • Kevel
      • Klaviyo
      • Kustomer
      • Labelbox
      • LaunchDarkly
      • LinkedIn
      • LiveIntent
      • Loops
      • Mailchimp
      • Mailchimp Transactional (Mandrill)
      • Mailgun
      • Marketo
      • Meilisearch
      • Microsoft Advertising
      • Microsoft Dynamics
      • Microsoft SQL Server
      • Microsoft Teams
      • Mixpanel
      • MoEngage
      • Mongo DB
      • mParticle
      • MySQL
      • NetSuite
      • Notion
      • OneSignal
      • Optimizely
      • Oracle Database
      • Oracle Eloqua
      • Oracle Fusion
      • Oracle Responsys
      • Orbit
      • Ortto
      • Outreach
      • Pardot
      • Partnerstack
      • Pendo
      • Pinterest
      • Pipedrive
      • Planhat
      • PostgreSQL
      • PostHog
      • Postscript
      • Productboard
      • Qualtrics
      • Radar
      • Reddit Ads
      • Rokt
      • RollWorks
      • Sailthru
      • Salesforce
      • Salesforce Commerce Cloud
      • Salesforce Marketing Cloud
      • Salesloft
      • Segment
      • SendGrid
      • Sense
      • SFTP
      • Shopify
      • Singular
      • Slack
      • Snapchat
      • Snowflake
      • Split
      • Sprig
      • Stripe
      • The Trade Desk
      • TikTok
      • Totango
      • Userflow
      • Userpilot
      • Vero Cloud
      • Vitally
      • Webhooks
      • Webflow
      • X Ads (formerly Twitter Ads)
      • Yahoo Ads (DSP)
      • Zendesk
      • Zoho CRM
      • Zuora
    • Custom & Partner Destinations
  • 📎Misc
    • Credits
    • Census Embedded
    • Data Storage
      • Census Store
        • Query Census Store from Snowflake
        • Query Census Store locally using DuckDB
      • General Object Storage
      • Bring Your Own Bucket
        • Bring your own S3 Bucket
        • Bring your own GCS Bucket
        • Bring your own Azure Bucket
    • Developers
      • GitLink
      • Dataset API
      • Custom Destination API
      • Management API
    • Security & Privacy
      • Login & SSO Settings
      • Workspaces
      • Role-based Access Controls
      • Network Access Controls
      • SIEM Log Forwarding
      • Secure Storage of Customer Credentials
      • Digital Markets Act (DMA) Consent for Ad Platforms
    • Health and Usage Reporting
      • Workspace Homepage
      • Product Usage Dashboard
      • Observability Toolkit
      • Alerts
    • FAQs
Powered by GitBook
On this page
  • Required Permissions
  • Configuring a new Databricks connection
  • Allowed IP Addresses
  • Need help connecting to Databricks?

Was this helpful?

  1. Data Sources
  2. Available Sources

Databricks

Learn how to configure Databricks for use by Census and why those permissions are needed.

PreviousConfluent CloudNextElasticsearch

Last updated 7 months ago

Was this helpful?

Databricks is a popular data warehouse for data engineering and data science thanks to its deep emphasis on Apache Spark, as well as SQL.

Census supports a wide set of Databricks deployments including

  • Unity Catalog

  • SQL Warehouses (including Serverless)

  • All Databricks LTS versions up to and including 14.3, and new versions typically work without issue.

Census supports both on Databricks.

If you'd like to skip these steps, you can use to set up a new connection to Census.

Required Permissions

If using the Advanced Sync Engine and the CENSUS schema has not already been created, you'll need to create the schema and grant permissions. Databricks uses a different form of authentation that most databases. When connecting to Databricks, you'll be able to use either a or a .

For Personal Access Tokens, run:

CREATE SCHEMA IF NOT EXISTS CENSUS;

-- For personal access tokens, use an email address:
GRANT USE SCHEMA, SELECT, CREATE TABLE, MODIFY ON SCHEMA [your_default_catalog].CENSUS TO `user_you_plan_use_with_pat@yourcompany.com`;

-- For service principals, use the client ID:
GRANT USE SCHEMA, SELECT, CREATE TABLE, MODIFY ON SCHEMA [your_default_catalog].CENSUS TO `service-principal-clientid-guid`;

Configuring a new Databricks connection

      1. Create a new service principal with the Add service principal button. Give it a name you'll remember such as Census. You can also reuse an existing one.

      2. Once created, click Generate secret which will create a new Client ID and Secret pair. Keep this somewhere safe as you won't be able to access it again.

      3. Select your new service principal and mark them as admin on the workspace.

    • If you're using Personal Access Token, you can create this for yourself. Alternatively, may want to create a new specific user account for Census to use for auditing and access control.

      1. Clicking on your Profile Icon in the top right and selecting Settings. Then click the Developer option in the left settings menu and click on Manage next to Access Tokens. We recommend you create a new Access Token:

        • Name: Census (or some other details)

  • Note: Service Principals cannot be connected to All Purpose Clusters that are in the Single User Access Mode.

  1. You'll need to collect three credentials to connect to your compute:

    • Hostname

    • Port

    • HTTP Path

    For SQL Warehouses, switch to the Connection details tab.

    For All Purpose Clusters, in the Configuration tab, open the Advanced Options section at the bottom, then select the JDBC/ODBC section.

  2. All Purpose Clusters only - Add the following configuration parameters to your cluster. These are also in Advanced Options section at the bottom under the Spark section.

    spark.hadoop.fs.s3n.impl.disable.cache true
    spark.hadoop.fs.s3.impl.disable.cache true
    spark.hadoop.fs.s3a.impl.disable.cache true
    • Provide the connection credentials: Hostname, Port, HTTP Path

    • Select your credential type (Personal Access Token or Service Principal), and provide the corresponding Access Token, or Client ID & Secret.

    • Optionally, set the Database Allow List. This will filter the SCHEMAS that appear in Census. Note that if you are using unity catalog, this filtering will apply across all catalogs.

  3. After the connection is saved, go ahead and press the Test button. This will validate that you've completed the above steps correctly. Once you've got a checkmark for all four steps, you're good to go!

Allowed IP Addresses

Need help connecting to Databricks?

First, you'll need to select which form of access credentials to use: (Recommended, but a bit more work) or .

If you're using a Service Principal, within your Databricks Account Console, go to the .

Now you'll need to add the service principal as an admin on the specific Workspace you are connecting to. In your Databricks Account console, go the . Click on the name of your workspace and go to the Permissions tab.

You'll first need to navigate into the specific Workspace you are connecting to. In your Databricks Account console, go the . Select the workspace you'd like Census to connect to and then click Open workspace in the top right.

Lifetime: (clear the box) - This will prevent the token from expiring

If you're not already, go into your target workspace by visiting the and clicking the Open link next to it. Now within your selected Workspace, select Compute from the left menu. Census can connect to a SQL Warehouse or All Purpose Cluster. You can reuse an existing compute resource, or create a new one here. Click on the Compute you've decided to use.

Now you're ready to add the connection to Census. Visit the in Census, and click New Source, selecting Databricks from the menu.

Select the you'd like to use. Note that this cannot be changed once the connection is created.

If you're using Databricks's Allowed IPs network policy, you'll need to add these Census IP addresses to your list. You can find Census's set of IP address for your region in . Visit the for more details on how to specify these IPs as part of your network policy.

via support@getcensus.com or start a conversation with us via the chat.

🧮
Basic and Advanced Sync Engines
Databricks built-in Partner Connect
Personal Access Tokens
Service Principal
Service Principal
Personal Access Tokens
User management page and Service Principals tab
Workspaces page
Workspaces page
Workspaces page
Sources page
Sync Engine
Contact us
in-app
Databricks Documentation
Regions & IP Addresses