LogoLogo
  • 🦩Overview
  • 💾Datasets
    • Overview
    • Core Concepts
      • Columns & Annotations
      • Type & Property Mappings
      • Relationships
    • Basic Datasets
      • dbt Integration
      • Sigma Integration
      • Looker Integration
    • SaaS Datasets
    • CSV Datasets
    • Streaming Datasets
    • Entity Resolution
    • AI Columns
      • AI Prompts Recipe Book
    • Enrichment Columns
      • Quick Start
      • HTTP Request Enrichments
    • Computed Columns
    • Version Control
  • 📫Syncs
    • Overview
    • Triggering & Scheduling
    • Retry Handling
    • Live Syncs
    • Audience Syncs
    • Observability
      • Current Sync Run Overview
      • Sync History
      • Sync Tracking
      • API Inspector
      • Sync Alerts
      • Observability Lake
      • Datadog Integration
      • Warehouse Writeback
      • Sync Lifecycle Webhooks
      • Sync Dry Runs
    • Structuring Data
      • Liquid Templates
      • Event Syncs
      • Arrays and Nested Objects
  • 👥Audience Hub
    • Overview
    • Creating Segments
      • Segment Priorities
      • Warehouse-Managed Audiences
    • Experiments and Analysis
      • Audience Match Rates
    • Activating Segments
    • Calculated Columns
    • Data Preparation
      • Profile Explorer
      • Exclusion Lists
  • 🧮Data Sources
    • Overview
    • Available Sources
      • Amazon Athena
      • Amazon Redshift
      • Amazon S3
      • Azure Synapse
      • ClickHouse
      • Confluent Cloud
      • Databricks
      • Elasticsearch
      • Kafka
      • Google AlloyDB
      • Google BigQuery
      • Google Cloud SQL for PostgreSQL
      • Google Pub/Sub
      • Google Sheets
      • Greenplum
      • HTTP Request
      • HubSpot
      • Materialize
      • Microsoft Fabric
      • MotherDuck
      • MySQL
      • PostgreSQL
      • Rockset
      • Salesforce
      • SingleStore
      • Snowflake
      • SQL Server
      • Trino
  • 🛫Destinations
    • Overview
    • Available Destinations
      • Accredible
      • ActiveCampaign
      • Adobe Target
      • Aha
      • Airship
      • Airtable
      • Algolia
      • Amazon Ads DSP (AMC)
      • Amazon DynamoDB
      • Amazon EventBridge
      • Amazon Pinpoint
      • Amazon Redshift
      • Amazon S3
      • Amplitude
      • Anaplan
      • Antavo
      • Appcues
      • Apollo
      • Asana
      • AskNicely
      • Attentive
      • Attio
      • Autopilot Journeys
      • Azure Blob Storage
      • Box
      • Bloomreach
      • Blackhawk
      • Braze
      • Brevo (formerly Sendinblue)
      • Campaign Monitor
      • Canny
      • Channable
      • Chargebee
      • Chargify
      • ChartMogul
      • ChatGPT Retrieval Plugin
      • Chattermill
      • ChurnZero
      • CJ Affiliate
      • CleverTap
      • ClickUp
      • Constant Contact
      • Courier
      • Criteo
      • Crowd.dev
      • Customer.io
      • Databricks
      • Delighted
      • Discord
      • Drift
      • Drip
      • Eagle Eye
      • Emarsys
      • Enterpret
      • Elasticsearch
      • Facebook Ads
      • Facebook Product Catalog
      • Freshdesk
      • Freshsales
      • Front
      • FullStory
      • Gainsight
      • GitHub
      • GitLab
      • Gladly
      • Google Ads
        • Customer Match Lists (Audiences)
        • Offline Conversions
      • Google AlloyDB
      • Google Analytics 4
      • Google BigQuery
      • Google Campaign Manager 360
      • Google Cloud Storage
      • Google Datastore
      • Google Display & Video 360
      • Google Drive
      • Google Search Ads 360
      • Google Sheets
      • Heap.io
      • Help Scout
      • HTTP Request
      • HubSpot
      • Impact
      • Insider
      • Insightly
      • Intercom
      • Iterable
      • Jira
      • Kafka
      • Kevel
      • Klaviyo
      • Kustomer
      • Labelbox
      • LaunchDarkly
      • LinkedIn
      • LiveIntent
      • Loops
      • Mailchimp
      • Mailchimp Transactional (Mandrill)
      • Mailgun
      • Marketo
      • Meilisearch
      • Microsoft Advertising
      • Microsoft Dynamics
      • Microsoft SQL Server
      • Microsoft Teams
      • Mixpanel
      • MoEngage
      • Mongo DB
      • mParticle
      • MySQL
      • NetSuite
      • Notion
      • OneSignal
      • Optimizely
      • Oracle Database
      • Oracle Eloqua
      • Oracle Fusion
      • Oracle Responsys
      • Orbit
      • Ortto
      • Outreach
      • Pardot
      • Partnerstack
      • Pendo
      • Pinterest
      • Pipedrive
      • Planhat
      • PostgreSQL
      • PostHog
      • Postscript
      • Productboard
      • Qualtrics
      • Radar
      • Reddit Ads
      • Rokt
      • RollWorks
      • Sailthru
      • Salesforce
      • Salesforce Commerce Cloud
      • Salesforce Marketing Cloud
      • Salesloft
      • Segment
      • SendGrid
      • Sense
      • SFTP
      • Shopify
      • Singular
      • Slack
      • Snapchat
      • Snowflake
      • Split
      • Sprig
      • Stripe
      • The Trade Desk
      • TikTok
      • Totango
      • Userflow
      • Userpilot
      • Vero Cloud
      • Vitally
      • Webhooks
      • Webflow
      • X Ads (formerly Twitter Ads)
      • Yahoo Ads (DSP)
      • Zendesk
      • Zoho CRM
      • Zuora
    • Custom & Partner Destinations
  • 📎Misc
    • Credits
    • Census Embedded
    • Data Storage
      • Census Store
        • Query Census Store from Snowflake
        • Query Census Store locally using DuckDB
      • General Object Storage
      • Bring Your Own Bucket
        • Bring your own S3 Bucket
        • Bring your own GCS Bucket
        • Bring your own Azure Bucket
    • Developers
      • GitLink
      • Dataset API
      • Custom Destination API
      • Management API
    • Security & Privacy
      • Login & SSO Settings
      • Workspaces
      • Role-based Access Controls
      • Network Access Controls
      • SIEM Log Forwarding
      • Secure Storage of Customer Credentials
      • Digital Markets Act (DMA) Consent for Ad Platforms
    • Health and Usage Reporting
      • Workspace Homepage
      • Product Usage Dashboard
      • Observability Toolkit
      • Alerts
    • FAQs
Powered by GitBook
On this page
  • Getting Started
  • Connect a Streaming Source
  • Create a Streaming Dataset
  • Add a Sample Message
  • Key Features of Streaming Datasets
  • Real-time Processing
  • Event-based Architecture
  • On-the-fly Transformations with Liquid Templates
  • Use Cases for Streaming Datasets
  • Live Syncs
  • Best Practices for Streaming Datasets

Was this helpful?

  1. Datasets

Streaming Datasets

Streaming Datasets allow you to define the data you work with in streaming sources that organize their data into topics to which new messages are published over time.

PreviousCSV DatasetsNextEntity Resolution

Last updated 2 months ago

Was this helpful?

Streaming datasets enable real-time data activation in Census, allowing you to sync data with low latency to your business applications. Unlike traditional batch-based datasets, streaming datasets continuously process data as it arrives, making them ideal for time-sensitive use cases.

Getting Started

Connect a Streaming Source

Before you create a Streaming Dataset, you’ll need to connect a streaming source. See individual source documentation:

Once your streaming source is connected, open Datasets and click New Dataset. Select Streaming Dataset and click Next.

Create a Streaming Dataset

To create a new streaming dataset:

  1. Navigate to the Datasets section in Census

  2. Click "New Dataset" and select "Streaming Dataset"

  3. Choose your streaming connection (Kafka, Confluent Cloud, etc.)

  4. Add a Sample Message (see below)

Add a Sample Message

In the code editor on the right side of the new dataset dialog, enter a sample JSON message for your new Streaming Dataset. Census will use the sample message you provide to infer the fields and data types of the messages in your dataset.

Census does not treat sample messages as customer data, and they are stored with the rest of your organization’s metadata in Census’s US-based control plane. Do not include real customer data or PII in your sample message.

When you are finished configuring your new Streaming Dataset, click Create Dataset.

Key Features of Streaming Datasets

Real-time Processing

Streaming datasets process data continuously as it arrives, enabling:

  • Ultra-low latency from data creation to activation

  • Immediate reactions to customer behavior or system events

  • Real-time personalization and engagement opportunities

Event-based Architecture

Unlike traditional datasets that operate on tables or views, streaming datasets work with events:

// Example event from a Kafka topic
{
  "event_id": "evt_12345",
  "user_id": "usr_789",
  "event_type": "page_view",
  "page_url": "https://example.com/products/123",
  "timestamp": "2023-06-15T14:22:31Z",
  "session_id": "sess_456",
  "device": "mobile",
  "browser": "chrome"
}

On-the-fly Transformations with Liquid Templates

When activating streaming datasets through syncs, you can apply real-time transformations using Liquid Templates. This powerful feature allows you to:

  • Transform data structure and format on-the-fly without modifying the source

  • Apply conditional logic to determine what data gets sent to destinations

  • Format dates, numbers, and strings to match destination requirements

  • Combine multiple fields or extract specific parts of fields

  • Create dynamic content based on event properties

For example, you can use a Liquid Template to create a personalized message from a streaming event:


<div data-gb-custom-block data-tag="if" data-0='cart_abandon' data-1='cart_abandon'>

  Hi {{ event.user_name }}, we noticed you left some items in your cart. 
  Use code COMEBACK15 for 15% off your purchase of {{ event.product_name }}.

<div data-gb-custom-block data-tag="elsif" data-0='product_view' data-1='product_view'></div>

  Hi {{ event.user_name }}, interested in {{ event.product_name }}? 
  Check out our latest collection!

</div>

These transformations happen in real-time as events flow through Census, allowing you to customize and enrich your data without adding latency to your streaming pipeline.

Use Cases for Streaming Datasets

Real-time Customer Engagement

Streaming datasets enable you to respond to customer actions the moment they happen:

  • Trigger immediate follow-up messages when a customer abandons a cart

  • Send personalized product recommendations based on browsing behavior

  • Update customer segments in real-time based on website or app interactions

  • Personalize website experiences based on just-observed actions

  • Deliver timely notifications when customers meet specific criteria

  • Initiate onboarding sequences the moment a user signs up

Operational Alerting

Keep your teams informed about critical events as they happen:

  • Send notifications to sales teams when high-value prospects take specific actions

  • Alert customer success teams when users encounter errors or get stuck

  • Notify support teams about potential service disruptions

  • Update dashboards in real-time with system performance metrics

  • Trigger escalations when SLA thresholds are approached

  • Monitor security events and respond to potential threats immediately

Time-sensitive Data Updates

Ensure your systems stay in sync with minimal delay:

  • Keep inventory systems updated across platforms to prevent overselling

  • Update pricing information across multiple channels simultaneously

  • Propagate user preference changes immediately across all systems

  • Sync subscription status changes to prevent access issues

  • Update account balances or usage metrics in near real-time

  • Reflect order status changes across customer-facing applications

Event-driven Workflows

Build sophisticated workflows that respond to events as they occur:

  • Trigger fulfillment processes when orders are placed

  • Initiate verification steps when suspicious activity is detected

  • Start onboarding workflows when new accounts are created

  • Launch re-engagement campaigns when usage drops below thresholds

  • Activate loyalty rewards when qualifying actions are completed

  • Coordinate cross-channel marketing based on customer interactions

Live Syncs

Streaming datasets are designed to work with Census Live Syncs, which provide continuous data activation:

  • Always On - Census monitors your streaming source 24/7

  • Immediate Processing - Data is processed and synced as soon as it arrives

  • Efficient Resource Usage - Only changed data is processed and synced

Best Practices for Streaming Datasets

Working with streaming data is different from batch processing. Here are some tips to help you get the most out of your streaming datasets:

  • Include timestamps in your events to ensure proper ordering and enable time-based processing

  • Add unique identifiers to each event to prevent duplicate processing and enable idempotent operations

  • Filter events at the source when possible to reduce unnecessary data transfer and processing

  • Keep your events focused by including only the data you need for your use cases

  • Set up monitoring for stream lag and processing delays to catch issues early

  • Create alerts within Census to detect anomalies

  • Plan for schema evolution by designing flexible event structures that can accommodate new fields

  • Use Liquid Templates for on-the-fly transformations rather than modifying your event sources

Remember that streaming datasets are designed for real-time use cases. If you don't need sub-minute latency, consider using basic datasets with scheduled syncs instead, which may be more cost-effective for your use case.

For more information on using streaming datasets with Live Syncs, see our .

💾
Kafka
Confluent Cloud
Google Pub/Sub
HTTP Request
Live Syncs documentation