LogoLogo
  • 🦩Overview
  • 💾Datasets
    • Overview
    • Core Concepts
      • Columns & Annotations
      • Type & Property Mappings
      • Relationships
    • Basic Datasets
      • dbt Integration
      • Sigma Integration
      • Looker Integration
    • SaaS Datasets
    • CSV Datasets
    • Streaming Datasets
    • Entity Resolution
    • AI Columns
      • AI Prompts Recipe Book
    • Enrichment Columns
      • Quick Start
      • HTTP Request Enrichments
    • Computed Columns
    • Version Control
  • 📫Syncs
    • Overview
    • Triggering & Scheduling
    • Retry Handling
    • Live Syncs
    • Audience Syncs
    • Observability
      • Current Sync Run Overview
      • Sync History
      • Sync Tracking
      • API Inspector
      • Sync Alerts
      • Observability Lake
      • Datadog Integration
      • Warehouse Writeback
      • Sync Lifecycle Webhooks
      • Sync Dry Runs
    • Structuring Data
      • Liquid Templates
      • Event Syncs
      • Arrays and Nested Objects
  • 👥Audience Hub
    • Overview
    • Creating Segments
      • Segment Priorities
      • Warehouse-Managed Audiences
    • Experiments and Analysis
      • Audience Match Rates
    • Activating Segments
    • Calculated Columns
    • Data Preparation
      • Profile Explorer
      • Exclusion Lists
  • 🧮Data Sources
    • Overview
    • Available Sources
      • Amazon Athena
      • Amazon Redshift
      • Amazon S3
      • Azure Synapse
      • ClickHouse
      • Confluent Cloud
      • Databricks
      • Elasticsearch
      • Kafka
      • Google AlloyDB
      • Google BigQuery
      • Google Cloud SQL for PostgreSQL
      • Google Pub/Sub
      • Google Sheets
      • Greenplum
      • HTTP Request
      • HubSpot
      • Materialize
      • Microsoft Fabric
      • MotherDuck
      • MySQL
      • PostgreSQL
      • Rockset
      • Salesforce
      • SingleStore
      • Snowflake
      • SQL Server
      • Trino
  • 🛫Destinations
    • Overview
    • Available Destinations
      • Accredible
      • ActiveCampaign
      • Adobe Target
      • Aha
      • Airship
      • Airtable
      • Algolia
      • Amazon Ads DSP (AMC)
      • Amazon DynamoDB
      • Amazon EventBridge
      • Amazon Pinpoint
      • Amazon Redshift
      • Amazon S3
      • Amplitude
      • Anaplan
      • Antavo
      • Appcues
      • Apollo
      • Asana
      • AskNicely
      • Attentive
      • Attio
      • Autopilot Journeys
      • Azure Blob Storage
      • Box
      • Bloomreach
      • Blackhawk
      • Braze
      • Brevo (formerly Sendinblue)
      • Campaign Monitor
      • Canny
      • Channable
      • Chargebee
      • Chargify
      • ChartMogul
      • ChatGPT Retrieval Plugin
      • Chattermill
      • ChurnZero
      • CJ Affiliate
      • CleverTap
      • ClickUp
      • Constant Contact
      • Courier
      • Criteo
      • Crowd.dev
      • Customer.io
      • Databricks
      • Delighted
      • Discord
      • Drift
      • Drip
      • Eagle Eye
      • Emarsys
      • Enterpret
      • Elasticsearch
      • Facebook Ads
      • Facebook Product Catalog
      • Freshdesk
      • Freshsales
      • Front
      • FullStory
      • Gainsight
      • GitHub
      • GitLab
      • Gladly
      • Google Ads
        • Customer Match Lists (Audiences)
        • Offline Conversions
      • Google AlloyDB
      • Google Analytics 4
      • Google BigQuery
      • Google Campaign Manager 360
      • Google Cloud Storage
      • Google Datastore
      • Google Display & Video 360
      • Google Drive
      • Google Search Ads 360
      • Google Sheets
      • Heap.io
      • Help Scout
      • HTTP Request
      • HubSpot
      • Impact
      • Insider
      • Insightly
      • Intercom
      • Iterable
      • Jira
      • Kafka
      • Kevel
      • Klaviyo
      • Kustomer
      • Labelbox
      • LaunchDarkly
      • LinkedIn
      • LiveIntent
      • Loops
      • Mailchimp
      • Mailchimp Transactional (Mandrill)
      • Mailgun
      • Marketo
      • Meilisearch
      • Microsoft Advertising
      • Microsoft Dynamics
      • Microsoft SQL Server
      • Microsoft Teams
      • Mixpanel
      • MoEngage
      • Mongo DB
      • mParticle
      • MySQL
      • NetSuite
      • Notion
      • OneSignal
      • Optimizely
      • Oracle Database
      • Oracle Eloqua
      • Oracle Fusion
      • Oracle Responsys
      • Orbit
      • Ortto
      • Outreach
      • Pardot
      • Partnerstack
      • Pendo
      • Pinterest
      • Pipedrive
      • Planhat
      • PostgreSQL
      • PostHog
      • Postscript
      • Productboard
      • Qualtrics
      • Radar
      • Reddit Ads
      • Rokt
      • RollWorks
      • Sailthru
      • Salesforce
      • Salesforce Commerce Cloud
      • Salesforce Marketing Cloud
      • Salesloft
      • Segment
      • SendGrid
      • Sense
      • SFTP
      • Shopify
      • Singular
      • Slack
      • Snapchat
      • Snowflake
      • Split
      • Sprig
      • Stripe
      • The Trade Desk
      • TikTok
      • Totango
      • Userflow
      • Userpilot
      • Vero Cloud
      • Vitally
      • Webhooks
      • Webflow
      • X Ads (formerly Twitter Ads)
      • Yahoo Ads (DSP)
      • Zendesk
      • Zoho CRM
      • Zuora
    • Custom & Partner Destinations
  • 📎Misc
    • Credits
    • Census Embedded
    • Data Storage
      • Census Store
        • Query Census Store from Snowflake
        • Query Census Store locally using DuckDB
      • General Object Storage
      • Bring Your Own Bucket
        • Bring your own S3 Bucket
        • Bring your own GCS Bucket
        • Bring your own Azure Bucket
    • Developers
      • GitLink
      • Dataset API
      • Custom Destination API
      • Management API
    • Security & Privacy
      • Login & SSO Settings
      • Workspaces
      • Role-based Access Controls
      • Network Access Controls
      • SIEM Log Forwarding
      • Secure Storage of Customer Credentials
      • Digital Markets Act (DMA) Consent for Ad Platforms
    • Health and Usage Reporting
      • Workspace Homepage
      • Product Usage Dashboard
      • Observability Toolkit
      • Alerts
    • FAQs
Powered by GitBook
On this page
  • Getting Started
  • Connecting to Your Data Warehouse
  • Direct Table and View Access
  • Custom SQL Queries
  • Python Transformations
  • Integration with Data Transformation Tools
  • dbt Integration
  • Looker Integration
  • Sigma Integration
  • Best Practices for Basic Datasets

Was this helpful?

  1. Datasets

Basic Datasets

PreviousRelationshipsNextdbt Integration

Last updated 2 months ago

Was this helpful?

Basic datasets are the foundation of Census's data activation capabilities. They allow you to leverage your existing data warehouse infrastructure to power your business operations by connecting your warehouse data directly to your business applications.

Need a hand getting a handle on your data warehouse? via support@getcensus.com or start a conversation with us via the chat to let us know what questions we can help with!

Getting Started

To create a new basic dataset:

  1. Navigate to the Datasets section in Census

  2. Click "New Dataset" and select "Basic Dataset"

  3. Choose your connection

  4. Select one of the following options:

    • Select a table - Choose a specific table or view directly from your data warehouse

    • SQL - Write a custom SQL query to define your dataset with more flexibility

  5. Test your dataset to ensure it returns the expected data

  6. Save and use your dataset in syncs

Connecting to Your Data Warehouse

Census provides multiple ways to define and access data in your warehouse:

Direct Table and View Access

The simplest way to get started with Census is to use the "Select a table" option to access existing tables and views in your data warehouse. Census can directly access these objects, making it easy to sync data that's already well-structured and ready for use.

Custom SQL Queries

When you need more flexibility, Census allows you to use the "SQL" option to define custom SQL queries directly in the platform. This approach is ideal when you need to:

  • Join multiple tables together

  • Apply complex transformations

  • Filter data to include only what's relevant for a specific destination

  • Create aggregations or calculations

-- Example: Creating a customer 360 view with SQL
SELECT 
  c.customer_id,
  c.email,
  c.name,
  c.created_at,
  SUM(o.amount) as lifetime_value,
  COUNT(o.order_id) as order_count,
  MAX(o.created_at) as last_order_date
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY 1, 2, 3, 4

Python Transformations

For even more advanced use cases, Census supports Python transformations that let you apply complex business logic, machine learning models, or data processing that goes beyond what SQL can easily accomplish.

# Example: Python transformation for customer segmentation
def transform(df):
    # Calculate recency, frequency, monetary value
    df['recency'] = (datetime.now() - df['last_order_date']).dt.days
    
    # Apply segmentation logic
    conditions = [
        (df['lifetime_value'] > 1000) & (df['recency'] < 30),
        (df['lifetime_value'] > 500) & (df['recency'] < 60),
        (df['lifetime_value'] > 0)
    ]
    choices = ['VIP', 'Regular', 'Occasional']
    df['segment'] = np.select(conditions, choices, default='Inactive')
    
    return df

Integration with Data Transformation Tools

Census seamlessly integrates with popular data transformation tools, allowing you to leverage your existing data modeling investments.

dbt Integration

Benefits of the dbt integration:

  • Automatically discover and use dbt models

  • Maintain consistent business logic across analytics and operations

  • Leverage dbt's testing and documentation capabilities

  • Sync data as soon as dbt models are refreshed

Looker Integration

Benefits of the Looker integration:

  • Reuse existing business logic defined in LookML

  • Maintain consistent definitions between reporting and operational systems

  • Leverage Looker's governance and access controls

Sigma Integration

Benefits of the Sigma integration:

  • Empower business users to create operational datasets

  • Maintain consistency between analytics and operations

  • Leverage Sigma's visual modeling capabilities

Best Practices for Basic Datasets

Getting the most out of your basic datasets requires some thoughtful planning. Here are some tips we've gathered from working with hundreds of data teams:

  • Use incremental syncs whenever possible to reduce load on your warehouse and speed up sync times

  • Add appropriate filters to your queries to limit the data being processed to just what you need

  • Create indexes on frequently queried columns in your warehouse to improve query performance

  • Consider materialized views for complex queries that are used frequently to reduce computation time

  • Add clear descriptions to your datasets to help business users understand what data is available

  • Set up appropriate access controls in your warehouse to maintain data security

  • Use version control to track changes to your dataset definitions and make it easy to roll back if needed

  • Validate data types and formats before syncing to avoid errors in destination systems

  • Create test syncs to validate new datasets before using them in production workflows

Census's allows you to directly access your dbt models without having to redefine them in Census. This ensures consistency between your analytics models and operational data, creating a single source of truth.

The allows you to use your Looker Explores and Looks as datasets in Census. This is particularly valuable for organizations that have invested heavily in Looker as their business intelligence platform.

For teams using Sigma as their analytics platform, Census's allows you to use Sigma workbooks as datasets. This enables business users who are building in Sigma to directly activate their insights without requiring data team intervention.

💾
Contact us
in-app
native dbt integration
Looker integration
Sigma integration