Data Cleansing with Golden Record.

How to ensure your customer data is clean: an overview of Golden Record’s data-cleansing features. 

Museums face lots of challenges when it comes to managing customer data. Data quality and data cleansing are high on the list. Many systems used for efforts like membership, fundraising, and communications come with the recommendation that users regularly review and clean up their customer data. However museum staff are already stretched thin, and manual data cleansing can be tedious. (And no one wants to be the person who accidentally removes a vital constituent record!)

Some system-to-system connectors (think of apps like Zapier or those offered through platforms like Salesforce) will do basic data cleanup as they sync records between systems. This is great if you only need to have clean data in those systems, but what about others without pre-built connectors? Or, what if you need more than basic data cleansing?

Golden Record takes a different approach. Built on tried-and-true data management principles, our system:

  1. pulls customer records from every system into a centralized data hub,
  2. connects and cleanses the data, creating golden records for each individual, and
  3. allows the clean data to be exported back to all the source systems.

This ensures that all of your systems have the same, accurate, up-to-date data. And it also lays the groundwork for extracting valuable insights into constituent engagement.

In our last blog post, we discussed the value of creating a golden record for each of your constituents. Now, we’ll explore how Golden Record cleanses data as it builds those golden records and take a closer look at our growing library of data-cleansing features.

What is data cleansing?

First, let’s define data cleansing (also called data cleaning). In their Guide to Data Cleaning, Salesforce/Tableau defines it as follows:

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset.

Clean data is vital for ensuring data quality. It’s also essential if you want to put your data to good use, whether through constituent interaction and communication or tracking engagement and analytics. But data cleansing can be time-consuming, especially if you’re trying to do it manually. 

Our unique solution for data cleansing: derived columns.

Golden Record uses a method, called “derived columns” to clean the records it receives. Developed by data warehousing experts, a derived column is applied to a specific piece of data in a record (also called a “column”), such as a name or phone number. An algorithm is used to standardize that data and create a new “derived column” with the cleansed result. 

Currently, Golden Record offers several types of derived columns. And we continue to expand the library as new use cases present themselves. Here’s an overview of some of the most commonly-used derived columns available in Golden Record today:

First name synonym

This creates a new column by normalizing the first name field to a standard name. For example: Rob, Bob, Bobby, and Robert all become Robert for matching purposes. This allows you to find duplicates and match data across datasets where a nickname was provided in one record and the full name in another.

Soundex

Soundex is a super-cool feature that is particularly helpful for last names. It converts the characters in a field into a code that represents the sound when pronounced. It’s great for catching misspellings and potential matches for unfamiliar names or those with unusual spellings. For example, the Soundex code for Wise and Wyse is the same (W200); they sound the same, even though they are spelled differently.

Just numbers

This feature strips out any non-numeric characters from a field and creates a new column that contains (as you probably guessed) just numbers. This is useful for matching phone numbers, which can be formatted in all sorts of ways, including the traditional parens and dashes, to more aesthetic choices using periods, spaces, or slashes. 

First word / last word

These derived columns are vital for working with datasets that use a single field for a person’s name, rather than separate first and last name fields. The feature splits the contents of a single field to create two new columns: one for the first word in a field (in this case the first name) and one for the last word in the field (last name). So a single field that contains “Bob Smith” becomes two separate fields for first name, “Bob,” and last name, “Smith.”

Replace characters 

This versatile feature finds and replaces specified characters within a field. It has all sorts of potential applications. For example, say your institution decided to reorganize its membership levels. You could use this derived column to easily find and update each constituent’s membership level to reflect your new offerings. 

Trailing characters 

This is particularly useful for sensitive information, such as social security numbers. It creates a new column that contains only a specified number of characters from the original string; for example, the last 4 digits of a social security number. 

Quality over quantity is key when it comes to sound data management. 

Modern cultural institutions need to have up-to-date and accurate information on their customers. Not only does clean data enable more effective constituent interactions, it also leads to long-term cost savings. From preventing duplicate postcards and emails to staying below your CRM limits, high-quality data helps prevent overspending. While prebuilt system-to-system connectors can be helpful, they have major limitations. By employing the principles of data management—those relied upon by data warehouse developers and data scientists for decades—Golden Record ensures your constituent data is of the highest quality and readily available to any system that needs it. This level of clean data is essential for tracking customer engagement and extracting business intelligence.

Are you ready to cleanse your customer data? Want to test out Golden Record’s derived columns for yourself? 

We love to geek out on data and museums, so if you’d like to learn more—even on a curiosity level—we’d love to hear from you.