Creating Sanitized Database Snapshots with DBSnapper¶
In this example, we're going to create a simple database with User account information. Our goal is to be able to create sanitized snapshot of this database that we can use in our development
Create a database and add some data¶
Let's create a database and call it example_app
psql -d 'postgres://postgres:postgres@localhost:15432?sslmode=disable' -c 'create database example_app;'
We'll create a simple users table with some basic user and authentication fields
DROP TABLE IF EXISTS users;
CREATE TABLE users (
id SERIAL PRIMARY KEY,
first_name text,
last_name text,
email character varying(110) unique not null,
password character varying(50) not null,
pin character varying(8)
);
And we'll load some sample data
INSERT INTO users (first_name, last_name, email, password, pin) VALUES
('John', 'Doe', 'johndoe@example.com', 'secretpassword', '2468'),
('Jane', 'Doe', 'janedoe@example.com', 'ubersecretpassword', '1357'),
('Fred', 'Smith', 'fsmith@dbsnapper.com', 'opensesame', '7890'),
('Sam', 'Jackson', 'sj@example.com', 'iamsam', '1234');
Note
Yes, our passwords and pins are in the clear for simplicity in this example, of course you wouldn't do this in production, would you?
Our database so far
example_app=# select * from users;
id | first_name | last_name | email | password | pin
----+------------+-----------+----------------------+--------------------+------
1 | John | Doe | johndoe@example.com | secretpassword | 2468
2 | Jane | Doe | janedoe@example.com | ubersecretpassword | 1357
3 | Fred | Smith | fsmith@dbsnapper.com | opensesame | 7890
4 | Sam | Jackson | sj@example.com | iamsam | 1234
Create a target¶
Our goal is to create a snapshot of the production example_app
database that we can copy locally to our development database. Let's do that:
As you can see in the highlighted section, we're specifying example_app
as our source datbase and example_app_dev
as our destination
Danger
We can't stress this enough - the database example_app_dev
in the example above will be automatically DROPPED and RECREATED when the load
command is used!
Checking our target¶
Let's make sure our target is configured properly and we can access it:
DBSnapper CLI v0.11.0+2b8aaba4.2022-05-03T21:58:59Z
Listing all targets
+-------------+----------+--------+----------------------------------+--------+--------------------------------------+-------+----------+
| NAME | LOCATION | STATUS | SRC | SIZE | DST | QUERY | MESSAGES |
+-------------+----------+--------+----------------------------------+--------+--------------------------------------+-------+----------+
| example_app | local | OK | pg://localhost:15432/example_app | 8.3 MB | pg://localhost:15432/example_app_dev | | |
+-------------+----------+--------+----------------------------------+--------+--------------------------------------+-------+----------+
STATUS shows OK and no errors, so we're good to go.
Build a snapshot¶
The first step in our journey is creating the snapshot. This can be done easily with the build
command:
DBSnapper CLI v0.11.0+2b8aaba4.2022-05-03T21:58:59Z
DB Source Host: localhost:15432
START: Build Snapshot for target: example_app
--> Local target, localhost DB.
--> Using Docker image: postgres:latest
FINISH: Building DB Snapshot for target: example_app to file:///Users/joescharf/.dbsnapper/1651695547_example_app.dir
Checking our snapshot¶
Now that we've successfully built a snapshot of our example_app
database, let's check the snapshots for this target:
DBSnapper CLI v0.11.0+2b8aaba4.2022-05-03T21:58:59Z
Listing snapshots for target: example_app
+-------+-------------------------+------------------------+----------------------------+--------+-------------+
| INDEX | CREATED | NAME | FILENAME | SIZE | SANITIZEDFN |
+-------+-------------------------+------------------------+----------------------------+--------+-------------+
| 0 | 2022-May-04 @ 14:19:07Z | 1651695547_example_app | 1651695547_example_app.zip | 1.4 kB | |
+-------+-------------------------+------------------------+----------------------------+--------+-------------+
Great, we have a single snapshot available. Note that the SANITIZEDFN
cell is empty, indicating that we don't have a sanitized version of this snapshot. That's ok, we'll deal with that later.
Load our snapshot¶
Now that we have our snapshot, let's load it into our development database specified on dst_url
of our target configuration.
DBSnapper CLI v0.11.0+2b8aaba4.2022-05-03T21:58:59Z
START: Loading Target: example_app, ORIGINAL snapshot index: 0, Snapshot Name: 1651695547_example_app, Snapshot File: /Users/joescharf/.dbsnapper/1651695547_example_app.zip, Dest URL: postgresql://postgres:postgres@localhost:15432/example_app_dev?sslmode=disable
Step 1: Drop and recreate the database
Step 2: Loading snapshot with Docker image postgres:latest, DB url: postgresql://postgres:postgres@localhost:15432/example_app_dev?sslmode=disable
FINISH: Loading Snapshot for Target: example_app, Snapshot File: /Users/joescharf/.dbsnapper/1651695547_example_app.zip
Checking our new database¶
Another success. Let's switch to the database and take a look at the data in the users
table:
example_app_dev=# \c example_app_dev
psql (13.1, server 12.6 (Debian 12.6-1.pgdg100+1))
You are now connected to database "example_app_dev" as user "postgres".
example_app_dev=# select * from users;
id | first_name | last_name | email | password | pin
----+------------+-----------+----------------------+--------------------+------
1 | John | Doe | johndoe@example.com | secretpassword | 2468
2 | Jane | Doe | janedoe@example.com | ubersecretpassword | 1357
3 | Fred | Smith | fsmith@dbsnapper.com | opensesame | 7890
4 | Sam | Jackson | sj@example.com | iamsam | 1234
(4 rows)
Great. So far we've made an exact copy of our example_app
source database which now resides at example_app_dev
.
But we have a problem. Passing around a database snapshot with all this personal information (PII) and sensitive authentication data (passwords, pins) is a security issue! Eventually something will git misplaced or misused, so we'll need to deal with that.
Sanitizing the snapshot¶
The sanitization process takes a query and executes it against a database snapshot. The resulting changes to the snapshot are exported and stored in a sanitized snapshot file.
Create the query¶
Let's create a sanitization query that will
- Obfuscate
first_name
andlast_name
- Change the
email
to match thefirst_name
andlast_name
- Change the
password
andpin
for all users to a common password used for development
example_app.san.sql | |
---|---|
Because we can, let's create a table and entry to record the time the sanitization was performed and the name of the query file used:
example_app.san.sql | |
---|---|
And let's save this file as example_app.san.sql
in our working_directory
(which defaults to ~/.dbsnapper
)
Update the target configuration¶
Now we need to update our target configuration and specify the query_file
that will be used for sanitization:
Perform the sanitization¶
Now we're ready to run the sanitize
command against the example_app
target:
DBSnapper CLI v0.11.0+2b8aaba4.2022-05-03T21:58:59Z
START: Sanitize example_app[0], Target: example_app, SnapshotName: 1651695547_example_app, Source: /Users/joescharf/.dbsnapper/1651695547_example_app.zip, Dest: /Users/joescharf/.dbsnapper/1651695547_example_app.san.zip
Step 1: Loading database to ephemeral DB url: postgres://circumvent:erythrodextrin-udometry-movement@forninst/rathite
Step 2: Sanitizing database with query:
... Query Removed for Brevity...
Step 3: Dumping sanitized database
Step 4: Cleaning up
FINISHED Sanitizing snapshot: /Users/joescharf/.dbsnapper/1651695547_example_app.san.zip
This output shows the steps taken to sanitize the database which involve:
- Loading the database snapshot to an ephemeral Docker container DB located at
postgres://circumvent:erythrodextrin-udometry-movement@forninst/rathite
- Running the sanitization query against the ephemeral database
- Dumping the sanitized database to a snapshot
- Cleaning up any temporary directories and removing the ephemeral Docker container.
Check the snapshot¶
Now that we've sanitized the database, let's take another look at the snapshots available for the example_app
target
DBSnapper CLI v0.11.0+2b8aaba4.2022-05-03T21:58:59Z
Listing snapshots for target: example_app
+-------+-------------------------+------------------------+----------------------------+--------+--------------------------------+
| INDEX | CREATED | NAME | FILENAME | SIZE | SANITIZEDFN |
+-------+-------------------------+------------------------+----------------------------+--------+--------------------------------+
| 0 | 2022-May-04 @ 14:19:07Z | 1651695547_example_app | 1651695547_example_app.zip | 1.4 kB | 1651695547_example_app.san.zip |
+-------+-------------------------+------------------------+----------------------------+--------+--------------------------------+
The big difference here is that 1651695547_example_app.san.zip
is listed in the SANITIZEDFN
column, which indicates that we have a sanitized database snapshot available for this snapshot index.
Load the snapshot¶
Like we did above, we'll once again, load the snapshot to our example_app_dev
development database:
In line 3 above we're loading the SANITIZED snapshot at index 0. The load
command automatically loads the sanitized snapshot for an index if one exists.
Note
If a sanitized snapshot exists, but you'd like to load the original snapshot, you can use the --original
flag to force this behavior:
dbsnapper load example_app 0 --original
Checking the sanitized database¶
Our sanitized database is now loaded, let's check it to see if it worked:
postgres=# \c example_app_dev
psql (13.1, server 12.6 (Debian 12.6-1.pgdg100+1))
You are now connected to database "example_app_dev" as user "postgres".
example_app_dev=# \dt
List of relations
Schema | Name | Type | Owner
--------+----------------+-------+----------
public | dbsnapper_info | table | postgres
public | users | table | postgres
(2 rows)
dbsnapper_info
table that holds the sanitization timestamp entry.
example_app_dev=# select * from users;
id | first_name | last_name | email | password | pin
----+------------+-----------+----------------------+-----------------+------
1 | User | Id1 | User_Id1@example.com | genericpassword | 0000
2 | User | Id2 | User_Id2@example.com | genericpassword | 0000
3 | User | Id3 | User_Id3@example.com | genericpassword | 0000
4 | User | Id4 | User_Id4@example.com | genericpassword | 0000
(4 rows)
And here is our user table with the sanitized first_name
, last_name
, email
, password
and pin
fields.
Conclusion¶
In this example, we created a simple database and demonstrated how you can use DBSnapper to copy it with the build
and load
commands, sanitize it with the sanitize
command, and then load
the sanitized version to your development database.
We hope you find this example useful. If you have any questions about dbsnapper, you can contact us via email at info@dbsnapper.com, or follow us on Twitter @dbsnapper.