In a continuous delivery process, is there a proper way to automatically move data from production to development?

Question

In a common continuous-delivery process, the code is moving from a development instance to a staging instance to production instance.

For development purpose (reproducing bugs, testing performance with a full data set), most of the time developers fetch data from production database to their development environment. See, for example, this question.

In my company, we use three instances beside production in our continuous delivery process:

latest: sync every night with our SCM trunk
staging: with the last released version before deployment to production
stable: with the exact same version of the software deployed in production (useful to reproduce bugs found on production)

The problem is that on the stable instance, for reproducing bugs we would like to have the exact same data set that is on production. So we would like to sync databases on a nightly basis.

Is it a good practice ? How to implement it ? Any pitfalls ?

@FlorianMotlik PostgreSQL 9 (but i was thinking about a database-agnostic solution) — numéro6, Jul 21 '14 at 08:27
Hard to determine database agnostic as different technologies have vastly different techniques for replication. For Postgres you have two options I think (and most other db's as well): * Streaming replication from Master to your staging/stable databases * Nightly Backup and insert into your stable/staging database I'd go with Nightly Backup and reinsert as this also tests your process for restoring backups, but depends on how fresh the data needs to be for you. — Florian Motlik, Jul 21 '14 at 23:29
What about `dump > data.sql ; git add data.sql ; git commit ; git push` and on the other side using a git hook and call `dbrestore data.sql` ? git will ensure to only transfer the diff on the data ensuring confidentiality (auth via ssh) — numéro6, Jul 23 '14 at 09:25

score 3 · Answer 1 · answered Jul 17 '14 at 12:52

3

Depending on the data you have in production, you may not want to replicate it back to non-production environments. (Or may not even be allowed to under certain regulations.) If you have customer data, personally identifiable information (PII), regulated data, financial data, credit card data, health data, SSN, or any other type of sensitive data, if you replicate it you need the full controls you have (or should have) in production - which you probably don't, and probably don't want.

answered Jul 17 '14 at 12:52

mr paul

31
3

You're right, the security concerns you raise are legit. Nevertheless, for most of the project we manage, just making all the data anonymous with 2~3 SQL update (to erase names, passwords, addresses and any PII) is secure enough. That's one of the pitfalls. – numéro6 Jul 17 '14 at 13:08
ideally, you should sanitize the data before you move it. So, you may want to duplicate the data in another database instance in your production environment, sanitize the data there, then migrate it to your non-prod environment. Otherwise, you have the production data existing in the non-production environment, even if briefly, which could put that environment in scope for additional controls. – mr paul Jul 17 '14 at 14:23
also, you should not store passwords nor encrypted passwords. You should be storing hashed values of passwords. – mr paul Jul 17 '14 at 14:25

score 1 · Answer 2 · answered Jul 30 '14 at 12:19

1

There are several VDB solutions which I recommend you to look for. One of them is Delphix

answered Jul 30 '14 at 12:19

Uri

190
1
3

score 0 · Answer 3 · answered Apr 14 '17 at 00:12

0

Windocks supports containers with integrated database cloning, and is used for just the use case described. Full disclosure, I work for Windocks.

answered Apr 14 '17 at 00:12

paul stanton

936
7
7

In a continuous delivery process, is there a proper way to automatically move data from production to development?

3 Answers3