This page describes how to configure Redshift credentials for use by Census and why those permissions are needed.
Census reads data from one or more tables (possibly across different schemata) in your database and publishes it to the corresponding objects in external systems such as Salesforce. To limit the load on your database as well as to other apps' APIs, Census computes a “diff” to determine changes between each update. In order to compute these diffs, Census creates and writes to a set of tables to a private bookkeeping schema (2 or 3 tables for each sync job configured).
We recommend you create a dedicated
CENSUSuser account with a strong, unique password. Census uses this account to connect to your Redshift warehouse. In order for the Census connection to work correctly, the
CENSUSaccount must have these permissions:
- The ability to create the
CENSUSschema and full admin access to all tables within that schema (including creating tables, deleting tables, and reading and writing to all tables).
- Read-only access to any tables and views in any schemata that you would like Census to publish to your service destinations.
Redshift permissions are complex and there are many ways to configure access for Census. The script below has been tested with Redshift and is known to work correctly:
-- Give the census user the ability to sign in with a password
CREATE USER CENSUS WITH PASSWORD '<strong, unique password>';
-- Create a private bookkeeping schema where Census can store sync state
CREATE SCHEMA CENSUS;
-- Give the census user full access to the bookkeeping schema
GRANT ALL ON SCHEMA CENSUS TO CENSUS;
-- Ensure the census user has access to any objects that may have already existed in the bookkeeping schema
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA CENSUS TO CENSUS;
-- Let the census user see this schema
GRANT USAGE ON SCHEMA "<your schema>" TO CENSUS;
-- Let the census user read all existing tables in this schema
GRANT SELECT ON ALL TABLES IN SCHEMA "<your schema>" TO CENSUS;
-- Let the census user read any new tables added to this schema
ALTER DEFAULT PRIVILEGES IN SCHEMA "<your schema>" GRANT SELECT ON TABLES TO CENSUS;
-- Let the census user execute any existing functions in this schema
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA "<your schema>" TO CENSUS;
-- Let the census user execute any new functions added to this schema
ALTER DEFAULT PRIVILEGES IN SCHEMA "<your schema>" GRANT EXECUTE ON FUNCTIONS TO CENSUS;
dbt (data build tool) often requires permissions to be re-granted on objects after it rebuilds models. Census customers frequently extend permissions management on their dbt models in two ways:
- 2.Access to all dbt tables and views (Recommended) – Census customers change the default permissions of the database user for their production dbt runs so that any tables and views created by that users are accessible to the Census database user. In this case, we extend the
ALTER DEFAULT PRIVILEGESto specifically indicate the defaults of the dbt production run user.
ALTER DEFAULT PRIVILEGES FOR USER "<your dbt run user> IN SCHEMA "<your dbt target schema>" GRANT SELECT ON TABLES TO CENSUS;
Because this is altering the default behavior of another user, this command must be run by a Redshift superuser.
- If you have multiple schemata that you would like Census to read from, repeat the steps for "<your schema>" for each of them
- In Redshift if there are views in your schema that reference tables in other schemata, you will also need to give Census read access to those other schemata.
- If you are using Census models to execute stored procedures (this is rare and not recommended for most users) you may also need to give Census access to those procedures
All connections from the Census Data Warehouse Service to your database, as well as connections from your Redshift database to S3, are protected by TLS encryption - Census will refuse to connect to a warehouse that does not support TLS. All Census data stored in S3 is encrypted with AWS Server-Side Encryption (SSE).
Redshift by default prevents any external IP address from accessing your data warehouse so if you will need to add these IP addresses to your security groups. For more information, visit AWS Redshift Help Center.
Census optionally allows connecting to Redshift that are only accessible on private/internal networks via SSH tunneling. To do so, you'll need to provide an SSH host server that is visible on the public internet and can connect to the private warehouse, and you'll also need to be able to perform some basic admin actions on that server.
- 1.Create a new user account for Census on the SSH host. (This account is separate from the database user account and can have a different username.)
- 2.On the Census connections page, create a new connection to Redshift enter the warehouse connection details, and then check the 'Use SSH Tunnel' option as shown below. Fill in the host and port of the SSH host machine along with the name of the user created in the previous step.
3. Once the connection is created, Census will generate a keypair for SSH authentication which can be accessed from the connections page.
To install the keypair, copy the public key in Census to you clipboard and add it to the SSH authorized keys file on the SSH host for the user created in the first step. If, for example, this user is named
census, the file should be located at
/home/census/.ssh/authorized_keys. You may need to create this file if it doesn't exist.
Note that the keypair is unique for each Census Warehouse connection. Even if you're reusing the same credentials, you'll need to add the new public keys.
4. If the SSH host restricts IP ranges that can connect to it, add the Census IPs to the allowlist.
With these steps complete, you should be able to complete a connection test, indicating that your tunneled connection is ready to be used in syncs.
Advanced methods of Redshift deployment include deploying Redshift within an AWS VPC or private subnet and limiting direct database access to a separate proxy (typically the SSH Tunnel method described above).
Census uses the
UNLOADcommand to bulk extract data efficiently (you can read more about our architecture in How Census Works). By default, Redshift-in-VPC deployments do not allow Redshift to talk to S3. So when using this method, you'll need to separately grant permission to Redshift to communicate directly with S3 by adding an S3 VPC Endpoint. This is definitely an obscure feature of AWS but this article does a good job of explaining S3 VPC endpoints, why they're needed and how to set one up.