Docker Compose (Recommended)

The simplest way to run OpenDataMask in any environment:

git clone https://github.com/MaximumTrainer/OpenDataMask.git
cd OpenDataMask

# Generate and export secrets
export JWT_SECRET=$(openssl rand -base64 32)
export ENCRYPTION_KEY=$(openssl rand -base64 32 | head -c 32)

# Start all services
docker-compose up -d

# Check status
docker-compose ps
docker-compose logs -f backend

Services started:

Docker Images

Pre-built images are published to the GitHub Container Registry on every push to main:

# Backend
docker pull ghcr.io/maximumtrainer/opendatamask/backend:latest

# Frontend
docker pull ghcr.io/maximumtrainer/opendatamask/frontend:latest

# CLI
docker pull ghcr.io/maximumtrainer/opendatamask/cli:latest

Building locally

docker build -t opendatamask-backend ./backend
docker build -t opendatamask-frontend ./frontend
docker build -t opendatamask-cli ./cli

Environment Variables

VariableRequiredDefaultDescription
DATABASE_URLYesโ€”JDBC URL for PostgreSQL metadata store
DATABASE_USERNAMEYesโ€”PostgreSQL username
DATABASE_PASSWORDYesโ€”PostgreSQL password
JWT_SECRETYesโ€”JWT signing secret. Generate with openssl rand -base64 32
ENCRYPTION_KEYYesโ€”Credential encryption key, exactly 16 or 32 characters
SERVER_PORTNo8080Backend HTTP listen port
JWT_EXPIRATIONNo86400000Token expiry in milliseconds (default 24 h)
MONGODB_URINoโ€”MongoDB connection URI (only needed when masking MongoDB sources)

Database Setup

OpenDataMask uses PostgreSQL 15+ for its metadata store. On first startup, Hibernate automatically creates the required schema (ddl-auto: update). No manual migration is needed.

# Create database manually (if not using docker-compose)
psql -U postgres -c "CREATE DATABASE opendatamask;"
psql -U postgres -c "CREATE USER opendatamask WITH PASSWORD 'secret';"
psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE opendatamask TO opendatamask;"

Terraform (AWS)

The infra/ directory provides Terraform configuration to provision a complete AWS environment โ€” VPC, EC2 instance, security groups, Elastic IP, and S3/DynamoDB remote state. Everything runs as docker-compose on a single t3.small EC2 instance (Amazon Linux 2023), keeping costs low while remaining production-upgradeable.

Prerequisites

One-time: Bootstrap Remote State

# Create S3 bucket for Terraform state
aws s3api create-bucket --bucket my-opendatamask-tfstate --region us-east-1
aws s3api put-bucket-versioning \
  --bucket my-opendatamask-tfstate \
  --versioning-configuration Status=Enabled

# Create DynamoDB table for state locking
aws dynamodb create-table \
  --table-name opendatamask-tf-locks \
  --attribute-definitions AttributeName=LockID,AttributeType=S \
  --key-schema AttributeName=LockID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

Deploy

cd infra
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars โ€” add your SSH public key

terraform init \
  -backend-config="bucket=my-opendatamask-tfstate" \
  -backend-config="dynamodb_table=opendatamask-tf-locks" \
  -backend-config="region=us-east-1"

terraform plan
terraform apply

# Get the server's public IP
terraform output server_public_ip

GitHub Secrets for CI/CD Pipeline

Configure in GitHub โ†’ Settings โ†’ Secrets and variables โ†’ Actions:

SecretDescription
AWS_ACCESS_KEY_IDAWS IAM access key
AWS_SECRET_ACCESS_KEYAWS IAM secret key
AWS_REGIONAWS region (e.g. us-east-1)
EC2_SSH_PRIVATE_KEYPEM private key for SSH deploys
EC2_SSH_PUBLIC_KEYMatching SSH public key (stored in EC2)
JWT_SECRET32+ char JWT signing secret
ENCRYPTION_KEY32 char field encryption key
TF_STATE_BUCKETS3 bucket for Terraform state
TF_STATE_DYNAMODB_TABLEDynamoDB table for state locking

Kubernetes

A basic Kubernetes deployment uses standard Deployment and Service resources. Store secrets with kubectl create secret:

kubectl create secret generic opendatamask-secrets \
  --from-literal=JWT_SECRET="$(openssl rand -base64 32)" \
  --from-literal=ENCRYPTION_KEY="$(openssl rand -base64 32 | head -c 32)" \
  --from-literal=DATABASE_PASSWORD="your-db-password"

Reference the secret in your Deployment envFrom block. A PostgreSQL StatefulSet or a managed cloud database (AWS RDS, Azure Database for PostgreSQL, Google Cloud SQL) is recommended for production.

CI / CD Pipelines

OpenDataMask ships with six GitHub Actions workflows delivering a complete build โ†’ deploy โ†’ verify โ†’ docs pipeline:

WorkflowFileTriggerPurpose
CIci.ymlpush / PR to mainBuild, lint, test backend + frontend + CLI; build and push Docker images to GHCR
Deploydeploy.ymlafter CI on mainFull pipeline: terraform apply โ†’ SSH deploy โ†’ health verify
Sandbox Verificationsandbox-verification.ymlpush / PR to mainEnd-to-end masking correctness check; publishes JUnit report artifact
Playwright E2Eplaywright-e2e.ymlafter Sandbox VerificationFull browser E2E test suite against the deployed frontend
Deploy Websitedeploy-website.ymlafter E2E on mainGenerate screenshots and publish documentation to GitHub Pages
CodeQLcodeql.ymlpush / PR / weeklyStatic security analysis for Kotlin, JS/TS, Go

Deploy Pipeline Flow

push to main
  โ””โ”€โ–บ CI (build + test + Docker push โ†’ GHCR)
        โ””โ”€โ–บ deploy.yml:
              โ”œโ”€ Job 1: terraform apply   โ† provision/update AWS infra
              โ”œโ”€ Job 2: SSH deploy         โ† docker-compose pull && up
              โ””โ”€ Job 3: verify             โ† curl /actuator/health โ†’ 200 โœ…
                    โ””โ”€โ–บ Sandbox Masking Verification
                              โ””โ”€โ–บ Playwright E2E Tests
                                        โ””โ”€โ–บ Deploy Website (GitHub Pages)

GitHub Environments (staging, production) track each deployment โ€” enabling deployment status, history, and environment URLs in the GitHub UI.

Security Notes