LogoLogo
  • 🦩Overview
  • 💾Datasets
    • Overview
    • Core Concepts
      • Columns & Annotations
      • Type & Property Mappings
      • Relationships
    • Basic Datasets
      • dbt Integration
      • Sigma Integration
      • Looker Integration
    • SaaS Datasets
    • CSV Datasets
    • Streaming Datasets
    • Entity Resolution
    • AI Columns
      • AI Prompts Recipe Book
    • Enrichment Columns
      • Quick Start
      • HTTP Request Enrichments
    • Computed Columns
    • Version Control
  • 📫Syncs
    • Overview
    • Triggering & Scheduling
    • Retry Handling
    • Live Syncs
    • Audience Syncs
    • Observability
      • Current Sync Run Overview
      • Sync History
      • Sync Tracking
      • API Inspector
      • Sync Alerts
      • Observability Lake
      • Datadog Integration
      • Warehouse Writeback
      • Sync Lifecycle Webhooks
      • Sync Dry Runs
    • Structuring Data
      • Liquid Templates
      • Event Syncs
      • Arrays and Nested Objects
  • 👥Audience Hub
    • Overview
    • Creating Segments
      • Segment Priorities
      • Warehouse-Managed Audiences
    • Experiments and Analysis
      • Audience Match Rates
    • Activating Segments
    • Calculated Columns
    • Data Preparation
      • Profile Explorer
      • Exclusion Lists
  • 🧮Data Sources
    • Overview
    • Available Sources
      • Amazon Athena
      • Amazon Redshift
      • Amazon S3
      • Azure Synapse
      • ClickHouse
      • Confluent Cloud
      • Databricks
      • Elasticsearch
      • Kafka
      • Google AlloyDB
      • Google BigQuery
      • Google Cloud SQL for PostgreSQL
      • Google Pub/Sub
      • Google Sheets
      • Greenplum
      • HTTP Request
      • HubSpot
      • Materialize
      • Microsoft Fabric
      • MotherDuck
      • MySQL
      • PostgreSQL
      • Rockset
      • Salesforce
      • SingleStore
      • Snowflake
      • SQL Server
      • Trino
  • 🛫Destinations
    • Overview
    • Available Destinations
      • Accredible
      • ActiveCampaign
      • Adobe Target
      • Aha
      • Airship
      • Airtable
      • Algolia
      • Amazon Ads DSP (AMC)
      • Amazon DynamoDB
      • Amazon EventBridge
      • Amazon Pinpoint
      • Amazon Redshift
      • Amazon S3
      • Amplitude
      • Anaplan
      • Antavo
      • Appcues
      • Apollo
      • Asana
      • AskNicely
      • Attentive
      • Attio
      • Autopilot Journeys
      • Azure Blob Storage
      • Box
      • Bloomreach
      • Blackhawk
      • Braze
      • Brevo (formerly Sendinblue)
      • Campaign Monitor
      • Canny
      • Channable
      • Chargebee
      • Chargify
      • ChartMogul
      • ChatGPT Retrieval Plugin
      • Chattermill
      • ChurnZero
      • CJ Affiliate
      • CleverTap
      • ClickUp
      • Constant Contact
      • Courier
      • Criteo
      • Crowd.dev
      • Customer.io
      • Databricks
      • Delighted
      • Discord
      • Drift
      • Drip
      • Eagle Eye
      • Emarsys
      • Enterpret
      • Elasticsearch
      • Facebook Ads
      • Facebook Product Catalog
      • Freshdesk
      • Freshsales
      • Front
      • FullStory
      • Gainsight
      • GitHub
      • GitLab
      • Gladly
      • Google Ads
        • Customer Match Lists (Audiences)
        • Offline Conversions
      • Google AlloyDB
      • Google Analytics 4
      • Google BigQuery
      • Google Campaign Manager 360
      • Google Cloud Storage
      • Google Datastore
      • Google Display & Video 360
      • Google Drive
      • Google Search Ads 360
      • Google Sheets
      • Heap.io
      • Help Scout
      • HTTP Request
      • HubSpot
      • Impact
      • Insider
      • Insightly
      • Intercom
      • Iterable
      • Jira
      • Kafka
      • Kevel
      • Klaviyo
      • Kustomer
      • Labelbox
      • LaunchDarkly
      • LinkedIn
      • LiveIntent
      • Loops
      • Mailchimp
      • Mailchimp Transactional (Mandrill)
      • Mailgun
      • Marketo
      • Meilisearch
      • Microsoft Advertising
      • Microsoft Dynamics
      • Microsoft SQL Server
      • Microsoft Teams
      • Mixpanel
      • MoEngage
      • Mongo DB
      • mParticle
      • MySQL
      • NetSuite
      • Notion
      • OneSignal
      • Optimizely
      • Oracle Database
      • Oracle Eloqua
      • Oracle Fusion
      • Oracle Responsys
      • Orbit
      • Ortto
      • Outreach
      • Pardot
      • Partnerstack
      • Pendo
      • Pinterest
      • Pipedrive
      • Planhat
      • PostgreSQL
      • PostHog
      • Postscript
      • Productboard
      • Qualtrics
      • Radar
      • Reddit Ads
      • Rokt
      • RollWorks
      • Sailthru
      • Salesforce
      • Salesforce Commerce Cloud
      • Salesforce Marketing Cloud
      • Salesloft
      • Segment
      • SendGrid
      • Sense
      • SFTP
      • Shopify
      • Singular
      • Slack
      • Snapchat
      • Snowflake
      • Split
      • Sprig
      • Stripe
      • The Trade Desk
      • TikTok
      • Totango
      • Userflow
      • Userpilot
      • Vero Cloud
      • Vitally
      • Webhooks
      • Webflow
      • X Ads (formerly Twitter Ads)
      • Yahoo Ads (DSP)
      • Zendesk
      • Zoho CRM
      • Zuora
    • Custom & Partner Destinations
  • 📎Misc
    • Credits
    • Census Embedded
    • Data Storage
      • Census Store
        • Query Census Store from Snowflake
      • General Object Storage
      • Bring Your Own Bucket
        • Bring your own S3 Bucket
        • Bring your own GCS Bucket
        • Bring your own Azure Bucket
    • Developers
      • GitLink
      • Dataset API
      • Custom Destination API
      • Management API
    • Security & Privacy
      • Login & SSO Settings
      • Workspaces
      • Role-based Access Controls
      • Network Access Controls
      • SIEM Log Forwarding
      • Secure Storage of Customer Credentials
      • Digital Markets Act (DMA) Consent for Ad Platforms
    • Health and Usage Reporting
      • Workspace Homepage
      • Product Usage Dashboard
      • Observability Toolkit
      • Alerts
    • FAQs
Powered by GitBook
On this page
  • Getting started
  • Prerequisites
  • Connecting to an S3 bucket
  • Using role-based permissions
  • Supported Sync Behaviors
  • Update or Create Syncs
  • File Path
  • Variables
  • Advanced Configuration
  • Other Things to Know
  • Required permissions
  • Need help connecting to S3?

Was this helpful?

  1. Destinations
  2. Available Destinations

Amazon S3

This page describes how to use Census with S3.

PreviousAmazon RedshiftNextAmplitude

Last updated 8 months ago

Was this helpful?

Getting started

This guide shows you how to use Census to connect your S3 account to your data warehouse.

Prerequisites

Before you begin, you'll need the following:

  • Census account: If you don't have this already, .

  • S3 account: You'll need an S3 bucket that's ready for use, along with credentials for an AWS user that can write to that bucket. See AWS documentation for . The specific permissions Census may need are covered later in .

  • Have the proper credentials to access to your data source.

Connecting to an S3 bucket

  1. Enter authentication details for your AWS account and S3 bucket: Access Key ID, Secret Key, Bucket Name, and AWS Region. Census uses these details to authenticate – they must match your AWS setup. If you would like to use role based permissions instead, .

  2. Click Save Connection.

Using role-based permissions

As an alternative to using keys you may opt to grant Census access to a role in your AWS account. This won't provide any additional functionality from Census, but may be preferable for your AWS configuration. This is a multi-step process with parts happening in Census and inside your AWS console.

Step 1: When configuring the S3 destination click the "Use role" checkbox. Provide your bucket and region, but leave access and secret key blank. Click Connect:

Step 2: The automated connection check will run at this point and fail, this is expected.

Step 3: Click the 'Back' button to return to editing the destination. You should now see an 'External ID' input box with a string in it. You will use this string in the following step.

Step 4: Open your AWS Console in a separate tab and browse to the IAM service. Click 'Roles' and 'Create role'.

  • When creating the role choose 'AWS Account' for Trusted Entity Type and the 'Another AWS Account' radio button.

  • Provide Census's AWS Account ID: 341876425553.

  • Check the 'Require external ID' checkbox and enter the External ID string from Step 3.

  • When done, click on your role and copy its ARN. Go back to the tab where you're editing the Census S3 Destination and enter the role ARN.

  • Click 'Connect'. The tester should re-run and succeed.

Supported Sync Behaviors

Behavior

Supported?

Objects

Update or Create

✅

All

Replace

✅

All

Update or Create Syncs

Update or Create syncs upload your whole dataset on the first run and only new changes on subsequent runs. Each sync run saves to a different file. The first run saves with "full" at the end of the file name. For example, filename_12_12_23_full.csv if it runs on 12/12/2023. Later syncs save with a timestamp at the end, like filename_12_12_23_1702426195.csv, so you can see how your data changes over time.

Learn more about all of our sync behaviors on our Core Concepts page.

File Path

When setting up a sync to S3, you can provide a file path for the file name Census will create/replace. The file path can include folders. Data arrives in one file to the designated bucket and file path.

Variables

When defining the File Path, you can use variables that will be set when the sync runs. This allows you to create and sync to new files that reflect the date and time of the sync.

Variable

Description

Example Values

%Y

4-digit year

1997

%y

2-digit year

97

%m

month with zero padding

07, 12

%-m

month without zero padding

7, 12

%d

day with zero padding

03, 23

%-d

day without zero padding

3, 23

%H

24 hour with zero padding

08, 18

%k

24 hour without zero padding

8, 18

%I

12 hour with zero padding

08, 12

%l

12 hour without zero padding

8, 12

%M

minute with zero padding

04, 56

%S

second with zero padding

06, 54

Advanced Configuration

In addition to the file path, you can configure how the data is encoded as it is written. Primarily this is a question of file format:

  • CSV - The standard comma separated values file. You can optionally specify an alternative delimeter such as |*, and you can enable/disable the header row.

  • TSV - The tab separated values file. You can enable/disable the header row.

  • JSON - A single JSON arraay of objects

  • NDJSON - New line-delimited list of JSON objects

  • Parquet - A columnar storage format that is more efficient for certain types of data.

  • If your configured delimiter is present in data values, Census will automatically add double quotes around the value. Example: Hello, world is written as as "Hello, world" if the chosen delimiter is a comma.

In addition to file format, you can also provide a PGP Public Key to encrypt the data before it is written to the file. This is useful for ensuring that the data is secure in transit and at rest.

Other Things to Know

  • We support syncing files up to 100GB. Files larger than 5GB may require some additional permissions, see Required permissions below.

  • We highly recommend adding default server-side encryption to your S3 buckets. Census supports syncing to buckets with encryption policies as long as the bucket uses an AWS provided key type like the Amazon S3 key (SSE-S3) or the AWS Key Management Service key (SSE-KMS). If the bucket uses SSE-KMS, make sure the IAM role credentials associated with the S3 connection have access to the AWS KMS key used for encryption. We do not support syncing to buckets using a customer-provided encryption key.

Required permissions

For most S3 uploads, the only permission that we require is the s3:PutObject action.

For files larger than 5GB, Census makes use of S3's Multi-part upload which requires the additional permissions:

  • s3:AbortMultipartUpload

  • s3:ListMultipartUploadParts

  • s3:ListBucketMultipartUploads

Example Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts",
                "s3:ListBucketMultipartUploads"
            ],
            "Resource": [
                "arn:aws:s3:::example-bucket",
                "arn:aws:s3:::example-bucket/*"
            ]
        }
    ]
}

Need help connecting to S3?

Finish setting up your Role. Note that it should have the to access the bucket you are using as your Census destination!

if you want Census to support additional sync behaviors for S3.

if your use cases don't work with these limitations. We plan on addressing at least a few of these in the future!

For more details on how Multipart Uploads use these permissions, see the .

You can send our at support@getcensus.com or start a conversation from the in-app chat.

🛫
Let us know
Contact us
S3 documentation
support team an email
start with a free trial
setting up an S3 bucket
Required permissions
see the instructions below
required permissions