This page describes how to use Census with S3.
This guide shows you how to use Census to connect your S3 account to your data warehouse. This type of connection can sync data between the warehouse and your S3 bucket by mirroring data into a CSV file.
Before you begin, you'll need the following:
- 2.Click New Destination.
- 3.Select S3 from the dropdown list.
- 4.Enter a Name for your destination. This is only for your reference – it can be anything that makes sense to you.
- 5.Enter authentication details for your AWS account and S3 bucket: Access Key ID, Secret Key, Bucket Name, and AWS Region. Census uses these details to authenticate – they must match your AWS setup. If you would like to use role based permissions instead, see the instructions below.
- 6.Click Save Connection.
The setup will look something like this: 👇
The steps for connecting your data warehouse will depend on your technology. See the following guides:
After setting up your warehouse, your Destinations page should look something like this: 👇
Destinations page with data source and S3 service
When defining models, you'll write SQL queries to select the data you want to sync. This can be as simple as selecting everything in a specific database table or as complex as creating new calculated values.
- 2.Enter a name for your model. You'll use this to select the model later.
- 3.Enter your SQL query. If you want to test the query, use the Preview button.
- 4.Click Save Model.
Basic SQL query for a new model
The sync will move data from your warehouse to your S3 bucket. In this step, you'll define how that will work.
- 2.Under What data do you want to sync?, choose your data warehouse as the Connection and your model as the Source.
- 3.Under Where do you want to sync data to?, choose the name you assigned in Step 1 (we used S3) as the Connection. Enter the File Path for the CSV file where data will sync. The path can accept variables that will populate when the sync runs. See File Path Variables. Confirm the file path in the Template Preview field.
- 4.Under How should changes to the source be synced?, Select Update or Create or Mirror
- 5.Under Which properties should be updated?, choose whether to sync only Selected Properties or Sync All Properties. Syncing all properties will automatically add new properties to the sync if the model or database table changes.
- 6.To test your sync without actually syncing data, click Run Test and verify the results.
- 7.Click Next. This will open the Confirm Details page where you can see a recap of your setup.
- 8.If you want to start a sync immediately, set the Run a sync now? checkbox.
- 9.Click Create Sync.
When configuring your sync, the page should look something like this: 👇
Example sync setup for S3
Once your sync is complete, it's time to check your data. Open your CSV file from the S3 bucket and check that the file was created or updated correctly.
S3 bucket showing the new CSV file created by the Census sync
As an alternative to using keys you may opt to grant Census access to a role in your AWS account. This won't provide any additional functionality from Census, but may be preferable for your AWS configuration. This is a multi-step process with parts happening in Census and inside your AWS console.
Step 1: When configuring the S3 destination click the "Use role" checkbox. Provide your bucket and region, but leave access and secret key blank. Click Connect:
Step 2: The automated connection check will run at this point and fail, this is expected.
Step 3: Click the 'Back' button to return to editing the destination. You should now see an 'External ID' input box with a string in it. You will use this string in the following step.
Step 4: Open your AWS Console in a separate tab and browse to the IAM service. Click 'Roles' and 'Create role'.
- When creating the role choose 'AWS Account' for Trusted Entity Type and the 'Another AWS Account' radio button.
- Ask your Census account representative for the Census AWS account to use.
- Check the 'Require external ID' checkbox and enter the External ID string from Step 3.
- When done, click on your role and copy its ARN. Go back to the tab where you're editing the Census S3 Destination and enter the role ARN.
- Click 'Connect'. The tester should re-run and succeed.
When defining the File Path for an S3 sync, you can use variables that will be set when the sync runs. This allows you to create and sync to new CSV files in the S3 bucket that reflect the date and time of the sync.
Update or Create syncs upload your whole dataset on the first run and only new changes on subsequent runs. Each sync run saves to a different file. The first run saves with "full" at the end of the file name. For example,
filename_12_12_23_full.csvif it runs on 12/12/2023. Later syncs save with a timestamp at the end, like
filename_12_12_23_1702426195.csv, so you can see how your data changes over time.
- Data arrives in one file to the designated S3 bucket and file path.
- By default, files are written as a CSV with headers. Alternatively, you may choose TSV, JSON, NDJSON, or Parquet format. You can also specify your delimiter and disable headers if you wish.
- If your configured delimiter is present in data values, Census will automatically add double quotes around the value. Example:
Hello, worldis written as as
"Hello, world"if the chosen delimiter is a comma.
- We highly recommend adding default server-side encryption to your S3 buckets. Census supports syncing to buckets with encryption policies as long as the bucket uses an AWS provided key type like the Amazon S3 key (SSE-S3) or the AWS Key Management Service key (SSE-KMS). If the bucket uses SSE-KMS, make sure the IAM role credentials associated with the S3 connection have access to the AWS KMS key used for encryption. We do not support syncing to buckets using a customer-provided encryption key.
For most S3 uploads, the only permission that we require is the
For files larger than 5GB, Census makes use of S3's Multi-part upload which requires the additional permissions: