Skip to content

Ephemeral Databases

Overview

Ephemeral databases are temporary database containers that DBSnapper creates automatically for specific operations like sanitization and data transformation. These short-lived containers provide isolated environments for processing sensitive data without affecting your production or development databases.

How Ephemeral Databases Work

Automatic Creation

DBSnapper automatically creates ephemeral databases when needed:

targets:
  sanitize-example:
    snapshot:
      src_url: "postgres://user:pass@production:5432/app"
      dst_url: "postgres://user:pass@localhost:5432/app_dev"

    sanitize:
      override_query: |
        UPDATE users SET email = 'user' || id || '@example.com';

Behind the scenes: 1. DBSnapper creates a temporary database container 2. Loads your snapshot data into the ephemeral database 3. Runs sanitization queries in the isolated environment 4. Exports sanitized data to your destination 5. Automatically destroys the temporary container

Container Lifecycle

Creation Phase: - Unique container name generated automatically - Isolated Docker network created - Database credentials configured securely

Operation Phase: - Data loaded from source snapshot - Sanitization or transformation queries executed - Processed data prepared for export

Cleanup Phase: - Container stopped and removed - Network resources cleaned up - No persistent data remains

Use Cases

Data Sanitization

The most common use case for ephemeral databases:

targets:
  customer-data-sanitization:
    snapshot:
      src_url: "postgres://readonly:pass@prod:5432/customers"
      dst_url: "postgres://dev:pass@dev:5432/customers_sanitized"

    sanitize:
      override_query: |
        -- Remove sensitive customer data
        UPDATE customers SET 
          email = 'customer' || id || '@example.com',
          phone = '555-0000',
          ssn = NULL,
          credit_card = NULL;

        -- Anonymize addresses
        UPDATE addresses SET 
          street = 'Test Street',
          city = 'Test City',
          postal_code = '00000';

Data Transformation

Transform data structure during the copy process:

targets:
  legacy-migration:
    snapshot:
      src_url: "mysql://user:pass@legacy:3306/old_app"
      dst_url: "postgres://user:pass@new:5432/modernized_app"

    sanitize:
      override_query: |
        -- Transform legacy data formats
        UPDATE user_profiles SET 
          full_name = CONCAT(first_name, ' ', last_name),
          created_date = DATE(created_timestamp);

Testing Data Preparation

Create specialized datasets for testing:

targets:
  test-data-prep:
    snapshot:
      src_url: "postgres://user:pass@staging:5432/app"
      dst_url: "postgres://test:pass@test:5432/app_testing"

    sanitize:
      override_query: |
        -- Create predictable test data
        UPDATE users SET 
          email = 'test' || id || '@testcompany.com',
          status = 'active'
        WHERE id <= 100;

        -- Remove non-test data
        DELETE FROM users WHERE id > 100;

Database Engine Support

PostgreSQL Ephemeral Databases

Uses PostgreSQL Docker containers with automatic configuration:

targets:
  postgres-ephemeral:
    snapshot:
      src_url: "pgdocker://user:pass@source:5432/app"
      dst_url: "postgres://user:pass@dest:5432/app_clean"

    # Ephemeral PostgreSQL container created automatically
    sanitize:
      override_query: |
        UPDATE sensitive_table SET data = 'sanitized';

Features: - Full PostgreSQL feature support - Schema filtering compatibility - Parallel processing capabilities

MySQL Ephemeral Databases

Uses MySQL Docker containers with root access for operations:

targets:
  mysql-ephemeral:
    snapshot:
      src_url: "mydocker://user:pass@source:3306/app"
      dst_url: "mysql://user:pass@dest:3306/app_clean"

    # Ephemeral MySQL container created automatically  
    sanitize:
      override_query: |
        UPDATE users SET email = CONCAT('user', id, '@example.com');

Features: - MySQL 8.0 feature support - All storage engines supported - Automatic constraint handling

Configuration Options

Network Isolation

Ephemeral databases run in isolated Docker networks by default:

targets:
  isolated-processing:
    # Network automatically managed for security
    snapshot:
      src_url: "postgres://user:pass@production:5432/sensitive_db"
      dst_url: "postgres://user:pass@dev:5432/sanitized_db"

Resource Allocation

Control resources for ephemeral database containers:

targets:
  resource-controlled:
    cpus: 4  # Affects ephemeral container resources

    snapshot:
      src_url: "postgres://user:pass@large-db:5432/warehouse"
      dst_url: "postgres://user:pass@dest:5432/warehouse_processed"

    sanitize:
      override_query: |
        -- Resource-intensive sanitization
        UPDATE large_table SET sensitive_data = 'redacted';

Security Benefits

Data Isolation

Complete Isolation: - Ephemeral databases run in separate containers - No network access to production systems - Temporary credentials with limited scope

No Persistent State: - Containers destroyed after operations - No data left on disk after processing - Automatic cleanup of all resources

Credential Management

targets:
  secure-ephemeral:
    snapshot:
      # Production uses read-only credentials
      src_url: "postgres://readonly:limited@prod:5432/app"
      # Destination uses dev credentials  
      dst_url: "postgres://dev:dev_pass@dev:5432/app"

    # Ephemeral database uses auto-generated credentials
    sanitize:
      override_query: |
        UPDATE users SET email = 'safe' || id || '@example.com';

Performance Considerations

Optimization Strategies

For Large Datasets:

targets:
  large-ephemeral:
    cpus: 8  # More resources for ephemeral processing

    snapshot:
      src_url: "postgres://user:pass@warehouse:5432/analytics"
      dst_url: "postgres://user:pass@dev:5432/analytics_sample"

    sanitize:
      override_query: |
        -- Process in batches for better performance
        UPDATE large_table SET 
          sensitive_col = 'redacted'
        WHERE id BETWEEN 1 AND 10000;

Memory Management: - Ephemeral containers automatically sized based on data volume - Streaming processing for large datasets - Automatic cleanup prevents resource leaks

Processing Time

Factors Affecting Performance: - Source data size - Complexity of sanitization queries
- Available CPU and memory resources - Network bandwidth for data transfer

Typical Processing Times: - Small databases (< 1GB): 1-5 minutes - Medium databases (1-10GB): 5-30 minutes - Large databases (> 10GB): 30+ minutes

Troubleshooting

Common Issues

Container Creation Failures:

# Verify Docker is working
docker version
docker ps

# Check available resources
docker system df
docker system info

Memory Issues:

targets:
  memory-optimized:
    # Reduce resource usage for large datasets
    cpus: 2

    sanitize:
      override_query: |
        -- Process in smaller batches
        UPDATE users SET email = 'user' || id || '@example.com' 
        WHERE id <= 1000;

Network Connectivity:

# Check Docker networks
docker network ls

# Verify container networking
docker run --rm --network bridge postgres:16-alpine ping -c 1 google.com

Debug Commands

# Monitor ephemeral container creation
dbsnapper sanitize target-name --verbose

# Check Docker container logs (while running)
docker logs $(docker ps -l -q)

# Monitor resource usage
docker stats $(docker ps -l -q)

Performance Issues

Slow Processing: - Increase cpus setting for more resources - Simplify sanitization queries - Process data in smaller batches - Check available system memory

Container Startup Delays: - Pre-pull required Docker images - Ensure adequate disk space - Optimize Docker daemon settings

Best Practices

Query Optimization

targets:
  optimized-sanitization:
    sanitize:
      override_query: |
        -- Use efficient updates with indexes
        UPDATE users SET 
          email = 'user' || id || '@example.com'
        WHERE email IS NOT NULL;

        -- Batch process large tables
        UPDATE large_table SET 
          sensitive_data = NULL 
        WHERE created_date < '2023-01-01'
        LIMIT 10000;

Resource Management

targets:
  resource-aware:
    # Match resources to data size
    cpus: 4

    sanitize:
      override_query: |
        -- Monitor progress with row counts
        UPDATE users SET email = 'test' || id || '@example.com';
        -- Returns: UPDATE 15000 (example)

Security Practices

  • Use read-only source credentials when possible
  • Limit sanitization queries to necessary operations only
  • Verify sanitization results before deploying to development
  • Monitor ephemeral container resource usage to detect issues