Washtub prevents data leaks and theft by anonymizing production data for other uses. Designed specifically for developers, Washtub is built around a simple command line interface that allows developers to pull clean data from production without personally identifiable information residing on their local machine.
The basic steps involved are:
- Register and create an account.
- Initialize your database on Washtub.
- Wash the database.
- Download the cleaned, anonymized, masked data.
Washtub aims to be extremely easy for developers to use. To that end, a Heroku CLI plugin
allows pulling cleaned data for local development with just a single command. This new
command will replace your existing usage of
pg:pull to ensure only anonymized
data resides on your local machine.
To enable local command line usage with Heroku, you'll need to set an environment variable with a Washtub API token and install the Heroku CLI. You may obtain your Washtub API token from your Washtub dashboard after creating an account.
First, set your token:
$ heroku config:set WASHTUB_TOKEN=your_washtub_token Setting WASHTUB_TOKEN and restarting ⬢ your-app-name... done, v999 WASHTUB_TOKEN: your_washtub_token
Next, install the plugin:
$ heroku plugins:install heroku-washtub yarn add v1.6.0 info No lockfile found. [1/4] Resolving packages... [2/4] Fetching packages... [3/4] Linking dependencies... [4/4] Building fresh packages... Done in 10.44s. Installing plugin heroku-washtub... done
Use the plugin to initialize your database on Washtub. You may optionally pass a
specific database by using the Heroku database URL name.
HEROKU_POSTGRESQL_COBALT_URL. Washtub will default to
DATABASE_URL without an argument.
$ heroku washtub:init Initializing washtub for your database DATABASE_URL... Done.
After initializing your database, you can confirm and set your wash strategies in the Washtub web dashboard. Read more about Initialization for an overview of the process.
Once you have initialized your database and confirmed your wash strategies through the web dashboard,
you may wash your database and download a copy for local development in a single step.
The CLI params are the same as that of
heroku pg:pull. Specify the Heroku database name
along with your local Postgres database name.
$ heroku washtub:wash DATABASE_URL statusgator_development
The Washtub database initialization process involves several key steps:
- Connect on a read-only basis to your production database
- Import your schema: table names and column names and types
- Suggest a wash strategy for each column
- Disconnect from your database without making any changes
It's important to note that no data is modified on your running database. Only the schema is read to suggest washing strategies. Later, when washing your database, a new copy is made using a backup and that copy is anonymized and exported to your database for local development.
The initialization process will ingest your schema and make recommendations on appropriate washing strategies to use for each column. A washing strategy is a method of manipulating data in a given column using obfuscation, randomization or other modification to anonymize data.
To see the suggested strategies and confirm their use, sign into Washtub and visit your dashboard where the suggested strategies will be displayed for you to review:
Only strategies that you confirm are used. If you do not confirm any suggested strategies, or manually set some on your own, then no anonymization will be performed. To confirm, simply review the list, choose different strategies as needed, and save your changes.
Generates a sequential email address in the form
Email addresses are guaranteed to be unique within a table.
Generates a string consisting of 9 random integers. These social security numbers
are not masked so they take the form
XXXXXXXXX rather than
Random First Name
Utilizes Ffaker to generate a random first (given) name. Will use both male and female names from the Ffaker library. Not guaranteed to be unique.
Random Last Name
Utilizes Ffaker to generate a random last (family) name. Not guaranteed to be unique.
Random Company Name
Utilizes Ffaker to generate a random company name. Not guaranteed to be unique.
Useful for stripping unneeded data, this strategy will nullify every row of the chosen column.
Database initialization is an idempotent process. You can run the initialization process over again on your database and it will generate the same suggestions. If you change your schema by adding or changing tables or columns, you can reinitialize and any new tables or columns will be given updated strategy suggestions.
If you override a strategy, a new one will not be suggested. Only columns with no strategy chosen or with a suggested strategy will be analyzed for appropriate was strategies.