Amazon S3
This page describes how to add AWS S3 as a source to Census.
Census lets you select CSV files stored in AWS S3 as a source for your syncs. Here's how it works:
- 1.You give Census the name of an S3 bucket you control and a prefix within that bucket to scan for objects.
- 2.You grant Census access to read data from your bucket (see our authentication instructions below).
- 3.When a sync starts, Census looks for the most-recently-modified object in the bucket matching the configured prefix. This must be an uncompressed CSV file with a header row. Census compares this data in this CSV to the previously synced data and sends any changes to the destination of your choice, based on your configured sync behavior.
This will walk you through creating a new S3 source and using it to sync data.
Census uses role-based authentication to connect to your bucket, as recommended by AWS. This involves a three-step "handshake" between your AWS account and Census.
These examples use the AWS CLI, but you can use the AWS Console, API, Terraform, or other tools instead to accomplish the same setup tasks.
- 1.Get your AWS Account ID; you'll use this to create a placeholder AssumeRolePolicy that we will replace with the Census AWS Account ID in a later step:aws sts get-caller-identityUse the
Account
value here (a 12-digit number) in the next step (replacing${YOUR_AWS_ACCOUNT_NUMBER}
). - 2.Create an IAM role in your AWS account that provides read-only access to your S3 bucket and prefix. Throughout this example, we'll assume that your bucket name is
census-docs-example
, your region isus-east-1
, and your prefix isdata/
:aws iam create-role \--role-name CensusReadOnlyToS3 \--assume-role-policy-document '{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"AWS": "arn:aws:iam::${YOUR_AWS_ACCOUNT_NUMBER}:root"},"Action": "sts:AssumeRole"}]}'This will return a JSON document containing anArn
for the role you just created - something like"arn:aws:iam::${YOUR_AWS_ACCOUNT_NUMBER}:role/CensusReadOnlyToS3"
. Keep track of this role ARN because you will need it throughout the rest of the setup process. - 3.Grant your newly-created role read-only access to your S3 bucket. We'll do this using an inline policy, but there are many ways to manage permissions in AWS IAM, so choose the appropriate technique for your organization's needs.aws iam put-role-policy \--role-name CensusReadOnlyToS3 \--policy-name CensusReadOnlyToS3 \--policy-document '{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Action": ["s3:ListBucket","s3:GetObject","s3:GetObjectVersion","s3:GetBucketLocation"],"Resource": ["arn:aws:s3:::census-docs-example","arn:aws:s3:::census-docs-example/*"]}]}'It's possible to further restrict this role to limit Census' access by object prefix, though we recommend using a dedicated bucket for sharing data with Census to keep roles and permissions simple.
- 4.
- 5.Provide the Region, Bucket Name, Role ARN, and Prefix to Census, and click "Connect".
- 6.Census will generate a unique external ID (
${CENSUS_EXTERNAL_ID}
) that helps secure the connection between your role and the Census AWS account (${CENSUS_AWS_ACCOUNT_NUMBER}
) and display it on the next screen. Update your role to grant access to the Census AWS Account (instead of your own) and to require this external ID for all API calls:aws iam update-assume-role-policy \--role-name CensusReadOnlyToS3 \--policy-document '{"Version": "2012-10-17","Statement": [{"Effect": "Allow","Principal": {"AWS": "arn:aws:iam::${CENSUS_AWS_ACCOUNT_NUMBER}:root"},"Action": "sts:AssumeRole","Condition": {"StringEquals": {"sts:ExternalId": "${CENSUS_EXTERNAL_ID}"}}}]}' - 7.Click the "Test" button in the Census UI to verify that the connection has been configured successfully.
You can now go on to create a sync from your S3 prefix to a destination app.
- 1.In Census, click on Syncs in the left menu
- 2.Click on Add Sync
- 3.Under connection select the name of your S3 connection
- 4.Since S3 doesn't operate like a traditional data warehouse, you'll see your prefix name repeated under the "Schema" and "Table" selection options - choose the default values.
- 5.Census will scan your S3 bucket and retreive the headers of your CSV, and use that to guide you through the rest of the sync configuration process.
For more detailed instructions on how to configure your sync, please read the docs page for the destination app you want to sync data to.
- In order to enable incremental syncs, Census will use Basic State Tracking to store an external snapshot of your source data between syncs on Census-owned infrastructure. Learn more here
- Because S3 does not support SQL queries, you cannot create SQL, dbt, or Looker models on S3 sources, nor can you use S3 sources with Census Entities or Segments.
- S3 Sources do not currently support the sync tester
Last modified 17d ago