Entity Resolution (Invite Only)

Business teams are dependent on trusted data to execute their growth initiatives. However, often the data is messy. One core reason for the messy data is to have duplicate records for various entities - Users, Organizations, Customers, Products or other objects.

These duplicate entities lead to chaos in CRM tools, inefficient marketing campaigns, incorrect analysis and wasted resources.

Entity Resolution

Entity Resolution helps you de-duplicate and associate records across all your data sources. Some key users cases include:

  • Removing duplicate records from your CRM (Salesforce, Hubspot, etc) applications.

  • Creating golden customer records from across various data sources.

  • Identity Resolution - Resolving anonymous users into real users

  • Associating different users into a common unit. For e.g. creating a household record from individual users

Entity Resolution is a way to structure your data to help create Golden Record—a single source of truth for your business applications.

Census Entity Resolution - De-duplication and Association

Core Concepts

Census supports Deterministic Entity Resolution with Fuzzy Match at the column level. Deterministic Entity Resolution uses human-defined rules-based approach to identify duplicate records or associated users and merge them into a single record.

Match Rules

Match Rules are the criteria we use to identify duplicate or associated records. You can define these rules with a number of possible operations including Exact Match and Fuzzy Match.

Some of the most common rules include

  • Matching users based on email address, mailing address or customer IDs

  • Matching companies based on their domain, company name or location

You can create complex rules using AND and OR operators across the rules.

Fuzzy Match

Fuzzy match uses machine learning to map similar field values. For e.g. two company records with company names as Acme HQ and ACME Europe might be same companies. You can use Fuzzy match to detect these into same companies.

Census also allows you to select confidence level for the Fuzzy Match. You can choose between low, medium and high confidence levels.

Merge Rules

Merge rules help you identify the winning record among the duplicates. The ID of the winning record becomes the primary ID and is useful while syncing back to your business applications.

Census supports waterfall structure rules. So, the first rule is evaluated first and then the next until a record becomes a winning record.

When you leave your merge rules empty, Census uses a record with lowest ID as the winning record.

Column Overrides

Column Overrides help you override column values on the winning record. You can conditionally choose values for the final / resolved record.

Map Rules (Coming Soon)

You can choose multiple datasets to merge into a resolved dataset through de-duplication and association.

Map Rules help your map columns from your source datasets into your resolved dataset.

Materialization

Entity Resolution generates a new dataset that's written back to your source warehouse under the Census Schema.

Entity Resolution is supported on Snowflake, BigQuery, Redshift and Postgress for now with support for other warehouses and data sources coming soon.

Last updated