Quick Start

The fastest way to run OpenDataMask is with Docker Compose:

git clone https://github.com/MaximumTrainer/OpenDataMask.git
cd OpenDataMask

# Generate secrets
export JWT_SECRET=$(openssl rand -base64 32)
export ENCRYPTION_KEY=$(openssl rand -base64 32 | head -c 32)

# Start all services
docker-compose up -d

# Open the UI
open http://localhost
Default ports โ€” Frontend: 80, Backend API: 8080, PostgreSQL: 5432.

Step-by-Step Setup Guide

The following walkthrough covers the full lifecycle of a masking project โ€” from creating an account through to a completed masking job with custom PII detection rules.

Screenshots are generated automatically from a live environment by verification/take_screenshots.py. Run pip install playwright && playwright install chromium then python3 verification/take_screenshots.py to regenerate them.
1

Register & Sign In

Navigate to http://localhost to reach the login page. Click Register to create your first account, or sign in with existing credentials. All passwords are bcrypt-hashed; credentials never leave your infrastructure.

OpenDataMask login page

The login page โ€” enter credentials or click Register to create a new account.

OpenDataMask registration form

The registration form โ€” fill in username, email, and password to create your account.

Login form filled in

After entering credentials, click Sign In to access your workspace dashboard.

2

Create a Workspace

A Workspace is the top-level container for a masking project. It holds all your database connections, table configurations, data mappings, and job history. Click New Workspace, enter a name and optional description, then click Create.

Workspaces list

The workspace dashboard โ€” lists all your workspaces. Click + New Workspace to create your first project.

Create workspace modal

The New Workspace modal โ€” give your project a name and optional description.

Workspace overview

The workspace overview tab โ€” shows a summary of connections, tables, and recent job activity.

Use descriptive workspace names like "Prod โ†’ QA Sync" or "GDPR Compliance Prep" to make multi-project setups easy to navigate.
3

Connect Your Databases

Navigate to the Connections tab inside your workspace. You need at least two connections: a source (production-like data) and a destination (target environment for masked data). Supported types include PostgreSQL, MySQL, Azure SQL, MongoDB, and file uploads.

For each connection, fill in:

  • Name โ€” a label used in job configuration
  • Type โ€” database engine (e.g., POSTGRESQL)
  • Host / Port / Database โ€” connection coordinates
  • Username / Password โ€” stored encrypted with AES-256
  • Role โ€” tick Source or Destination (or both)
Connections tab configured

The Connections tab with source and destination databases configured โ€” click + Add Connection to add more.

Add connection modal

The Add Connection form โ€” enter the host, database name, credentials, and role.

Connection form filled in

Filling in the source connection โ€” enable the Source role toggle and click Add Connection.

4

Configure Tables & Column Generators

The Tables tab lets you tell OpenDataMask which tables to process and how to transform each column. For every table, choose a masking mode (MASK, GENERATE, PASSTHROUGH, or SKIP), then configure a generator for each column that should be anonymised.

Tables tab with configuration

The Tables tab listing configured tables โ€” click + Add Table to add a new one.

Add table configuration form

Configuring the users table โ€” selecting MASK mode.

Generators configured for each column

Column generators configured: full_name โ†’ FULL_NAME, email โ†’ EMAIL, phone โ†’ PHONE. Each mapped column will receive realistic synthetic data when the job runs.

Use the Sensitivity Scan tab to auto-detect PII columns and receive generator recommendations โ€” then visit the Privacy Hub tab and click Apply Recommendations to configure them in one shot.
5

Data Mappings (Advanced Column Control)

The Data Mappings tab gives you fine-grained, per-column control that overrides the table-level generator configuration. Use it to set each column to one of three actions:

  • Mask โ€” replace the value with a generated fake (choose strategy: FAKE, HASH, NULL, REDACT, PARTIAL_MASK, or REGEX)
  • Migrate As-Is โ€” copy the original value unchanged
  • Omit โ€” exclude the column entirely from the output

The wizard walks you through three steps: select a connection โ†’ select a table โ†’ configure each column.

Data mapping wizard step 1

Step 1 โ€” Choose the connection whose schema you want to configure.

Data mapping wizard column configuration

Step 2 & 3 โ€” Select the table, then set the action and masking strategy for each discovered column. Changes are saved immediately.

6

Run a Masking Job

With connections and table configuration in place, go to the Jobs tab and click Run New Job. Select your source and destination connections, then click Run. The job runs asynchronously โ€” the progress badge cycles from PENDING โ†’ RUNNING โ†’ COMPLETED (or FAILED) in real time.

Jobs list

The Jobs tab โ€” lists all past and running jobs. Click Run New Job to start a masking run.

Run job modal

The Run New Job modal โ€” select the source and destination connections, then click Run.

Job log output

Expanding the job log shows per-table row counts and any warnings raised during masking.

After a successful job, connect to your destination database to verify. The verification/verify.py script automates this check โ€” it validates that PII columns contain synthetic data, primary keys are preserved, and no original values leaked through.
7

Custom PII Detection Rules

OpenDataMask ships with built-in sensitivity detection for common PII (email addresses, phone numbers, SSNs, etc.). The Sensitivity Rules page under Settings lets you define your own detection rules for domain-specific identifiers โ€” for example, an internal employee ID format or a proprietary customer reference number.

Each custom rule has:

  • Name โ€” displayed in scan reports and data mapping labels
  • Matchers โ€” column-name patterns and/or regex patterns that identify the PII type
  • Data type filter โ€” optionally restrict the rule to text, numeric, or date columns
  • Linked preset โ€” optionally auto-apply a masking strategy when the rule fires
Sensitivity rules list

The Sensitivity Rules settings page โ€” lists all built-in and custom PII detection rules.

Add PII rule drawer

The New Rule side panel โ€” give the rule a name and add one or more column-name matchers.

Configured PII rule

A custom rule targeting columns named employee_id โ€” the next sensitivity scan will flag matching columns and recommend a masking strategy.

After saving a new sensitivity rule, open the Sensitivity Scan tab in your workspace and click Run Scan to see it take effect.

Prerequisites

RequirementVersionNotes
JDK17+Temurin/OpenJDK recommended
Docker & Docker Compose20.10+For containerised deployment
PostgreSQL15+Production metadata store
Node.js20+Frontend development only
Go1.21+CLI build from source only

Installation & Build

Backend (Kotlin / Spring Boot)

cd backend
./gradlew build --no-daemon          # build + test
./gradlew bootRun --no-daemon        # run locally

Frontend (Vue 3)

cd frontend
npm ci                               # install dependencies
npm run dev                          # dev server (port 5173)
npm run build                        # production build

CLI (Go)

cd cli
go build -o odm .
./odm --help

Configuration

All sensitive configuration is supplied via environment variables:

VariableRequiredDescription
DATABASE_URLYesJDBC URL for PostgreSQL
DATABASE_USERNAMEYesPostgreSQL username
DATABASE_PASSWORDYesPostgreSQL password
JWT_SECRETYesJWT signing secret (min 32 chars)
ENCRYPTION_KEYYesCredential encryption key (16 or 32 chars)
SERVER_PORTNoBackend port (default: 8080)
MONGODB_URINoMongoDB URI when masking MongoDB sources

Core Concepts

Supported Database Connections

TypeKeyConnection String Format
PostgreSQLPOSTGRESQLjdbc:postgresql://<host>:<port>/<database>
MySQLMYSQLjdbc:mysql://<host>:<port>/<database>
Azure SQLAZURE_SQLjdbc:sqlserver://<server>.database.windows.net:1433;databaseName=<db>
MongoDBMONGODBmongodb://<host>:<port>/<database>
MongoDB Cosmos DBMONGODB_COSMOSmongodb://<account>.mongo.cosmos.azure.com:10255/<db>?ssl=true&...
File (CSV/JSON)FILEUploaded via the UI

For Azure SQL, TLS encryption is enabled automatically and credentials are provided separately from the connection string. The mssql-jdbc driver is bundled โ€” no extra installation needed.

Workspaces

A Workspace is the top-level scope for a masking project. It contains connections, table configurations, jobs, data mappings, and custom PII rules. Workspaces can inherit configuration from a parent workspace for multi-environment setups.

Masking Modes

ModeDescription
MASKReplace column values with generated fake data using the configured generator
GENERATEGenerate an entirely new row set regardless of source data
PASSTHROUGHCopy data without any modification
SUBSETCopy a filtered or sampled subset of rows
SKIPExclude the table from processing entirely

Data Mapping Actions

ActionDescription
MASKReplace the column value using the configured masking strategy
MIGRATE_AS_ISCopy the original value to the destination unchanged

Masking Strategies

StrategyDescriptionExample output
FAKEReplace with realistic synthetic data from the selected generatorJohn Smith
HASHDeterministic SHA-256 hash (consistent across runs)a3f1b2cโ€ฆ
NULLReplace with a SQL NULL valueNULL
REDACTReplace with a fixed redaction string (e.g., [REDACTED])[REDACTED]
PARTIAL_MASKMask all but the first or last N charactersjohn****@****.com
REGEXApply a find-and-replace regex transformationconfigurable

Generator Types

OpenDataMask includes 63+ built-in generators covering personal, financial, medical, and network data:

CategoryGenerators
PersonalNAME, FIRST_NAME, LAST_NAME, FULL_NAME, EMAIL, PHONE, BIRTH_DATE, GENDER, TITLE, JOB_TITLE, NATIONALITY
AddressADDRESS, STREET_ADDRESS, CITY, STATE, ZIP_CODE, COUNTRY, GPS_COORDINATES, LATITUDE, LONGITUDE, TIME_ZONE
IdentitySSN, PASSPORT_NUMBER, DRIVERS_LICENSE, MEDICAL_RECORD_NUMBER
FinancialCREDIT_CARD, IBAN, SWIFT_CODE, MONEY_AMOUNT, BTC_ADDRESS, ACCOUNT_NUMBER, CURRENCY_CODE
NetworkIP_ADDRESS, IPV6_ADDRESS, MAC_ADDRESS, URL, DOMAIN_NAME, USER_AGENT
BusinessORGANIZATION, COMPANY_NAME, DEPARTMENT
MedicalICD_CODE, HEALTH_PLAN_NUMBER
Data UtilitiesBOOLEAN, LOREM, TIMESTAMP
ControlNULL, CONSTANT, PARTIAL_MASK, FORMAT_PRESERVING, SEQUENTIAL, RANDOM_INT, HASH, SCRAMBLE

CLI Usage

# Authenticate
odm auth login --url http://localhost:8080 \
  --username admin --password secret

# Workspaces
odm workspace list
odm workspace get <workspace-id>

# Jobs
odm job list --workspace <workspace-id>
odm job run  --workspace <workspace-id>

Privacy & Compliance

The Privacy Hub dashboard shows:

Run Apply Recommendations to automatically assign generators to all unmasked sensitive columns in one click.

Troubleshooting

SymptomFix
JWT_SECRET must be setSet the JWT_SECRET environment variable before starting
Backend fails to startEnsure PostgreSQL is running and DATABASE_URL is correct
Connection refused on port 8080Run docker-compose logs backend to check for errors
Sensitivity scan finds nothingAdd table configurations first, then re-run the scan
Job fails with "no source connection"Ensure at least one connection is marked as a Source role in the workspace
Screenshots not generatedRun pip install playwright && playwright install chromium then re-run take_screenshots.py with the frontend running