Use the compare_schema API to monitor database schema changes in CI/CD pipelines and agentic systems
Docs/Neon Postgres guides/Extensions/PostgreSQL Anonymizer

The PostgreSQL Anonymizer extension

new

Protect sensitive data in Postgres with the PostgreSQL Anonymizer extension

The PostgreSQL Anonymizer extension provides a powerful and flexible way to mask or replace sensitive data within your Postgres database, making it an essential tool for protecting personally identifiable information (PII) and commercially sensitive data to ensure privacy, security, compliance, and adherence to regulations such as GDPR.

Try it on Neon!

Neon is Serverless Postgres built for the cloud. Explore Postgres features and functions in our user-friendly SQL editor. Sign up for a free account to get started.

Sign Up

This guide provides an introduction to the PostgreSQL Anonymizer extension. You’ll learn how to enable the extension, declare masking rules, and apply various anonymization techniques such as static masking, dynamic masking, and creating anonymized dumps of your database.

note

PostgreSQL Anonymizer is an open-source extension for Postgres that can be easily enabled on any Neon project.

Version availability:

Please refer to the list of all extensions available in Neon for up-to-date extension version information.

Enable the PostgreSQL Anonymizer extension

You can enable the extension by running the following CREATE EXTENSION statement in the Neon SQL Editor or from a client such as psql that is connected to your Neon database.

CREATE EXTENSION IF NOT EXISTS anon CASCADE;

The CASCADE option automatically installs any dependencies required by the anon extension.

Key concepts and masking strategies

The postgresql_anonymizer extension offers several powerful features and masking strategies to protect your data. Here's an overview of some key concepts:

Masking rules

Masking rules are the foundation of the extension. They are declarative, meaning you define what data to mask and how to mask it using SQL syntax, without writing complex procedural code. Masking rules are applied using SECURITY LABEL commands and are stored within your database schema, ensuring privacy by design.

Masking functions

The extension comes with a rich set of built-in masking functions to perform various anonymization techniques. These functions can be categorized as follows:

Faking

Generate realistic-looking fake data.

  • Example functions: anon.fake_first_name(), anon.dummy_city_name()

Pseudonymization

Create consistent, reversible "fake" data based on a seed value. This is useful for maintaining data relationships while anonymizing.

  • Example function: anon.pseudo_email(email)

Randomization

Generate random values within a defined range or from a set.

  • Example functions: anon.random_int_between(10, 100), anon.random_in_enum(enum_column)

Partial scrambling

Hide sensitive parts of a string while revealing only specific sections, such as the first and last few characters.

  • Example: The function anon.partial(social_security_number, 2, 'XXX-XX-', 1) masks all but the first 2 characters and the last character of the social_security_number, replacing the middle section with a fixed string ('XXX-XX-'). For instance, if social_security_number is 1234567890, the masked value will be 12XXX-XX-0.

Generic hashing

Create one-way hashes of data for irreversible anonymization.

  • Example function: anon.hash(email)

Destruction/Nullification

Replace data with static values or NULL for complete removal.

  • Example values: MASKED WITH VALUE 'CONFIDENTIAL', MASKED WITH VALUE NULL

Adding noise/variance

Slightly alter numerical or date values to maintain data distribution while adding anonymity.

  • Example: anon.noise(salary, 0.1) adds +/- 10% noise to the salary column.

Generalization

Replace specific values with broader ranges or categories. This is useful for data analysis while preserving anonymity and is often used in masking views.

Masking methods

The extension provides different methods to apply masking rules, catering to various use cases:

Static masking

Permanently modify the original data in your tables by applying the defined masking rules. This is useful for creating anonymized copies of data for archiving or specific purposes.

Dynamic masking

Mask data on-the-fly as users query the database. Sensitive data is hidden only for users with the "MASKED" role, while other users see the original data. This is ideal for development and testing environments where different access levels are needed.

Anonymous dumps

Create SQL dumps of your database where sensitive data is replaced according to the masking rules. This is useful for sharing anonymized data or creating backups for non-production environments.

Masking views

Create views that automatically apply masking rules, providing an anonymized interface to the underlying data without modifying the base tables. This is useful for controlled data access and reporting.

Declaring static masking rules

Static masking is a straightforward way to permanently anonymize data within a table. Let's walk through an example.

Suppose you have a users table with personal information:

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE,
    email VARCHAR(255),
    phone_number VARCHAR(20),
    city VARCHAR(100)
);

INSERT INTO users (username, email, phone_number, city) VALUES
    ('john_doe', 'john.doe@example.com', '555-123-4567', 'New York'),
    ('jane_smith', 'jane.smith@example.com', '555-987-6543', 'Los Angeles'),
    ('peter_jones', 'peter.jones@example.com', '555-555-1111', 'Chicago');

SELECT * FROM users;

The table contains the following data:

idusernameemailphone_numbercity
1john_doejohn.doe@example.com555-123-4567New York
2jane_smithjane.smith@example.com555-987-6543Los Angeles
3peter_jonespeter.jones@example.com555-555-1111Chicago

You can declare static masking rules using SECURITY LABEL FOR anon ON COLUMN ... IS 'MASKED WITH ...'. For example, to mask the email and phone_number columns:

-- Mask the email column with a fake email
SECURITY LABEL FOR anon ON COLUMN users.email
IS 'MASKED WITH FUNCTION anon.dummy_safe_email()';

-- Mask the phone_number column by partially scrambling it
SECURITY LABEL FOR anon ON COLUMN users.phone_number
IS 'MASKED WITH FUNCTION anon.partial(phone_number, 1, ''XXX-XXX-'', 2)';

Now, apply the static masking to the users table:

warning

Static masking is irreversible, this operation will permanently modify the data in the table according to the masking rules.

SELECT anon.anonymize_table('users');

Let's check the data again:

SELECT * FROM users;
idusernameemailphone_numbercity
1john_doemcknightjulie@example.org5XXX-XXX-67New York
2jane_smithdavidhanson@example.org5XXX-XXX-43Los Angeles
3peter_jonesmichael33@example.org5XXX-XXX-11Chicago

You will see that the email and phone_number columns have been permanently replaced with masked values according to the rules you defined. The original data is no longer present in these columns.

Implementing dynamic masking

Dynamic masking allows you to control data visibility based on user roles. Let's see how to set it up.

First, enable transparent dynamic masking at the database level:

ALTER DATABASE neondb SET anon.transparent_dynamic_masking = true;

Make sure to reconnect to your database for this setting to take effect.

Next, create a role that will be masked, for example, data_analyst:

CREATE ROLE data_analyst LOGIN PASSWORD 'analyst_password';
SECURITY LABEL FOR anon ON ROLE data_analyst IS 'MASKED';
GRANT pg_read_all_data TO data_analyst;

Now, define masking rules that will be applied to the data_analyst role. Let's mask the city and username columns in the users table:

-- Mask the city column with a static value
SECURITY LABEL FOR anon ON COLUMN users.city
IS 'MASKED WITH VALUE ''Confidential Location''';

-- Mask the username column with a custom function
SECURITY LABEL FOR anon ON COLUMN users.username
IS 'MASKED WITH FUNCTION anon.dummy_username()';

Connect to Neon as the data_analyst user and query the users table:

important

Make sure to connect as the data_analyst user to see the dynamic masking in action.

SELECT * FROM users;
idusernameemailphone_numbercity
1christy_providentharrisonalexander@example.com5XXX-XXX-67Confidential Location
2rudolph_errortedwards@example.com5XXX-XXX-43Confidential Location
3owen_nemojamesjackson@example.net5XXX-XXX-11Confidential Location

You will observe that the username and city columns are masked according to the rules you defined. The city column shows the static value 'Confidential Location', and the username column displays fake usernames generated by the dummy_username() function.

Now, connect as a regular user (neondb_owner - the database owner) and query the users table again:

-- Connect as the database owner
SELECT * FROM users;
idusernameemailphone_numbercity
1john_doemcknightjulie@example.org5XXX-XXX-67New York
2jane_smithdavidhanson@example.org5XXX-XXX-43Los Angeles
3peter_jonesmichael33@example.org5XXX-XXX-11Chicago

You will see the original data in the username and city columns, as dynamic masking is applied only to the data_analyst role. We still see the masked email and phone_number columns, as they were statically masked earlier.

Creating anonymous dumps

To create an anonymized SQL dump of your database, you can leverage the pg_dump utility along with the PostgreSQL Anonymizer extension.

First, create a dedicated role for generating anonymous dumps, and mark this role as masked:

CREATE ROLE anon_dump_user LOGIN PASSWORD 'dump_password';
ALTER ROLE anon_dump_user SET anon.transparent_dynamic_masking = true;
SECURITY LABEL FOR anon ON ROLE anon_dump_user IS 'MASKED';
GRANT pg_read_all_data TO anon_dump_user;

Now, using your terminal or command prompt, you can use pg_dump connecting as the anon_dump_user to export an anonymized dump:

pg_dump -U anon_dump_user -h <your_neon_hostname> -p 5432 <your_neon_database> -f anonymized_dump.sql --no-security-labels --exclude-extension="anon"

Replace <your_neon_hostname> and <your_neon_database> with your Neon connection details.

This command will generate an anonymized_dump.sql file containing a database dump where sensitive data is masked according to the rules you've set, ensuring that the dump is safe for sharing or use in non-production environments. The --no-security-labels option is important to prevent the masking rules themselves from being included in the dump, further securing your anonymization policy. --exclude-extension="anon" (available in pg_dump 17+) or --extension plpgsql ensures the extension itself is not included in the dump.

Conclusion

The PostgreSQL Anonymizer extension provides a robust and versatile toolkit for data masking and anonymization directly within your Postgres database. By leveraging its declarative masking rules, diverse masking functions, and different masking methods, you can effectively protect sensitive data for various use cases, including development, testing, data sharing, and compliance.

Reference

Need help?

Join our Discord Server to ask questions or see what others are doing with Neon. Users on paid plans can open a support ticket from the console. For more details, see Getting Support.

Last updated on

Was this page helpful?