What is data anonymization?

November 28, 2022

Valuable data collection methods for customer research

Data anonymization has to do with preserving confidential or private data, which organizations accomplish by encoding identifiers linking specific information to certain individuals, or by deleting the encoding identifiers, as noted by Corporate Finance Institute. As you would surmise, anonymizing data is becoming increasingly significant in the business world for privacy compliance.

Failure to safeguard the privacy of the sensitive information that you gather and store on customers and other individuals could lead to drastic repercussions, from lawsuits to the destruction of your reputation in your industry.

The lifeblood of your business is data, and this means that you need the best possible information available to keep everything properly humming along. Collecting data on your customers is a key process that will influence how well your company can expand in scope and scale as well as adjust to the inevitable changing conditions in the marketplace.

As you gather information, it’s of the utmost importance that you have a system in place to handle anonymized versus identifiable data. At Lytics, we often hear from customers about a common pain point across industries in terms of data collection and personalized marketing.

Let’s explore the concept of data anonymization, how it is used and the way that anonymized data has impacted digital marketing. We’ll examine the unique considerations for handling anonymized data as compared to working with identifiable data. We’ll also delve into how Customer Data Platforms can help organizations maximize the true value of their gathered data, while still remaining in compliance with applicable regulations concerning privacy.

Anonymization vs. pseudonymization

When getting ready to anonymize data in your organization, you’ll need to distinguish between anonymization vs. pseudonymization. In a nutshell, “the difference between the two techniques rests on whether the data can be re-identified,” according to a report from the International Association of Privacy Professionals.

In your data pipeline management, which you may need to set up with the help of third-party experts if you lack the talent on staff, you’ll keep anonymization of data top of mind. Otherwise, you run the risk of data being left unsecured within your organization.

It’s your responsibility From a regulatory standpoint, you should know that the IAPP notes that the General Data Protection Regulation protecting Europe regarding pseudonymization defines such information as “data rendered anonymous in such a way that the data subject is not or no longer identifiable.”

Keep in mind that what seems like pseudonymization may not be up to the task. Per University College London, “There is a residual risk of re-identification; the motivated intruder test can be used to assess the likelihood of this.”

This will involve considering the rarity of each attribute that was recorded for the people in the database, how large the geographical region is for this data, and what other kinds of data can be linked to this initial source of information. A motivated criminal could use such externalities to potentially link the previously de-identified information.

What data should be anonymized?

Putting yourself in the mindset of your customers or users will be helpful as you determine what data in your organization needs to be rendered anonymous.

For example, the age, sexual orientation, political affiliation, medical status and other highly sensitive details would be better left obscured. Ideally, you will notify consumers and be transparent about what data you collect and give them control over the details.

While keeping in mind their individual concerns, follow best practices as you decide what data to anonymize, all personally identifiable information needs to be processed with some level of anonymization to protect the people as well as your organization from the harm that can come when data is exposed to unauthorized people.

Personally identifiable information

There is a range of personally identifiable information or PII to keep in mind when it comes to protecting sensitive data. According to the U.S. Department of Labor, common forms of PII that might be anonymized include:

Name
Address
Social security number
Telephone number
Email address
Gender
Race
Birth date
Geographical region

Regulation of anonymized data

The regulatory environment related to anonymized data can vary according to the industry in which your organization does business and the location of your enterprise.

For example, as noted by Educause, if you are in the healthcare industry, you will have to respect the privacy regulations under the Health Insurance Portability and Accountability Act (HIPAA)

If you want to work with anonymized data, be aware that this can be a tricky process, since as noted earlier, motivated bad actors can attempt to use other sources of information to try to connect it with the data you have attempted to sanitize.

It is a controversial notion for some industry observers as to the extent to the true anonymization of data, especially as computer systems grow more powerful and are capable of crunching through larger data sets simultaneously.

Again, what part of the world you are doing business in will have an impact on anonymous data. In its guide to the European Union’s anonymization standards, the International Association of Privacy Professionals noted that The EU General Data Protection Regulation is among the most influential data privacy laws in the world — setting the standard, in many ways, for how global organizations implement their data privacy programs.”

Data anonymization tools and techniques

For newcomers, here’s an overview of the most prominent methods of data anonymization:

Data clean rooms:

This can be one of the more crucial data anonymization tools you will come to rely on. It’s a secure environment where your organization’s data experts can examine and review information as related to your advertising. They will look for PII to remove it or encrypt it. If not removed sufficiently, criminals can potentially de-anonymize the data, making it re-identifiable once more.

Randomization:

You alternate data, removing links between distinct people and the information, but keep the value of this data.

Swap data:

This involves exchanging or swapping data between different people’s records. For example, characteristics such as a person’s waistline or feet measurements could be manipulated to help conceal their identities. With a limited data pool, though, people still might be able to figure out who is who.

Generalize:

Here, you are only revealing coarser, less precise data. One example is converting specific ages to age ranges, such as 25-35 and 50-65.

Keep in mind this caveat. Per JD Supra “Achieving true anonymization may be nearly impossible—especially considering that a recent study showed that the majority of the United States’ population could be personally identified using three data points,” which are the gender, date of birth, and ZIP code.

Limitations of anonymized data

Anonymized data can present some limitations, of course. In terms of marketing or the user experience, you can’t directly address consumers by their wants and needs specifically.

But if you have anonymized this information successfully while still retaining access to different segments of these people, your marketing efforts can still reach out to them, just not as directly and personally. Using an online marketing platform built for such purposes will help you enormously, thanks to efficiencies and having all of the relevant data ready to work with via the cloud.

Data anonymization use cases

To get a better idea of how data anonymization works, here are examples of how certain types of businesses might anonymize their information and use that data.

Partnership sharing data:

At the Hackathon Urban Data event that occurred in Berlin in 2019, public transportation companies and residential construction businesses offered data, which participants at the event extracted anonymized results from all of the aggregated information, per Aircloak.

A team won the event by creating an algorithm that made better use of free parking space data, boosting the quality of live for the residents in Berlin, thanks to data that was rendered anonymous yet highly useful.

Medical patient data:

A use case with a bad result, to serve as a cautionary tale involves a university and a search engine. According to Aircloak, at the University of Chicago Medical Center, researchers partnered with Google to share patient data with the search engine giant. They have gotten into trouble because they shared hundreds of records of from medical patients but had not eliminated any date stamps or medical notes on these cases.

These cases underscore the crucial nature of providing transparency to people when gathering and using their data. Take them with you as your organization works toward the privacy-first future ahead.

Author

Mark Hayden

Product

Solutions

Resources