Query Census Store from Snowflake

Integrate Census Store with your Snowflake warehouse using Apache Iceberg for Zero-ETL access to Entity Resolution and CSV datasets

Prerequisites

To query Census Store from Snowflake, you will need:

A Snowflake account
A Census workspace with one or more CSV datasets materialized in Census Store

Step 1: Create Census Store Iceberg REST Catalog Credentials

In Census, create OAuth2 client credentials that Snowflake will use to access the Iceberg catalog for your workspace’s Managed Storage:

Open Census and navigate to Workspace settings.
Select the Census Store tab.
Under Iceberg Catalog, click Create Client Credentials.
Copy the newly-created client ID and secret. The secret is only visible when you first create a set of client credentials. If you lose the secret, you will need to create a new set of client credentials.

Step 2: Create Snowflake Catalog Integration

Now over to Snowflake. First, create a catalog integration that connects the Census Store Iceberg Catalog. Run the following SQL statement in your Snowflake account.

Choose a unique name for your catalog integration and substitute it for <catalog-integration-name>
Substitute your Census workspace’s Iceberg catalog endpoint for <catalog-endpoint> . You can find your workspace’s Iceberg catalog endpoint on the Census Store tab of Workspace settings under Iceberg Catalog.
Substitute your Census workspace’s Iceberg catalog name for <catalog-name>. You can find your workspace’s Iceberg catalog endpoint on the Census Store tab of Workspace settings under Iceberg Catalog.
Substitute the client ID and secret for the Iceberg catalog credentials you created in the previous step for <client-id> and <client-secret>.

CREATE OR REPLACE CATALOG INTEGRATION <catalog-integration-name>
   CATALOG_SOURCE = ICEBERG_REST
   TABLE_FORMAT = ICEBERG
   CATALOG_NAMESPACE = 'default'
   REST_CONFIG = (
     CATALOG_URI = '<catalog-endpoint>'
     WAREHOUSE = '<catalog-name>'
     ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
   )
   REST_AUTHENTICATION = (
     TYPE = OAUTH
     OAUTH_TOKEN_URI = '<catalog-endpoint>/v1/oauth/tokens'
     OAUTH_CLIENT_ID = '<client-id>'
     OAUTH_CLIENT_SECRET = '<client-secret>'
     OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
   )
   ENABLED = TRUE;

To test the connection was successful, run the following SQL statement in your Snowflake account to verify connectivity, substituting the name of the catalog integration you just created for <catalog integration name>:

SELECT SYSTEM$VERIFY_CATALOG_INTEGRATION('<catalog integration name>');

Step 3: Mount the catalog

Now you need to make that Iceberg catalog accessible for querying. Snowflake's Linked Catalog feature makes this easy.

Specify the name of Snowflake database you'd like to use for the tables in Iceberg in the <catalog-database-name>.
Provide the unique name for <catalog-integration-name> you created in step 2.

CREATE DATABASE <catalog-database-name> 
  LINKED_CATALOG = ( CATALOG = '<catalog-integration-name>' );

Now all the tables in Census Store should be available for querying under the database you specified.

Alternative Step 3: Mount individual tables

Instead of mounting the entire catalog, you can also create individual Snowflake Iceberg tables for the Census Store datasets you want to query from Snowflake.

Choose a unique name for your table in Snowflake. Substitute this name for <table name>.
Substitute the name of the catalog integration you created in the previous step for <catalog integration name>.
Substitute the namespace and table name your Census Store dataset is materialized in for <dataset namespace> and <dataset table name> respectively. You can find these names on the dataset details page for your dataset in Census.

CREATE OR REPLACE ICEBERG TABLE <fully.qualified.table_name>
  CATALOG = '<catalog integration name>'
  CATALOG_NAMESPACE = '<dataset namespace>'
  CATALOG_TABLE_NAME = '<dataset table name>'
  AUTO_REFRESH = TRUE;

Run a SELECT statement against the new table to verify the integration. With auto refresh enabled, Snowflake exposes new data as the data is updated in S3.

PreviousCensus Store NextQuery Census Store locally using DuckDB

Last updated 16 days ago

Was this helpful?