3

A question regarding a DB development project. The database already exist and is rather large (several TBs).

  1. What do you use for version control in DB development?
  2. How do you control concurrent changes to the data model by different teams
  3. What is your approach to the Unit Testing in the DB development
  4. How do you deal with the sensitive data if the DB owners do not know what is sensitive? What is your approach to the data obfuscation? What are your obfuscation techniques?
  5. How do you work on a large DB from several locations?

Please answer one or more of the items as you see fit. Each answer will be reviewed separately. Thank you very much!

EDIT: A related question with good answers to the p.1 is here: How do you version your database schema?

Community
  • 1
  • 1
Ilya Kochetov
  • 17,988
  • 6
  • 44
  • 60

2 Answers2

2

For 4, "How do you deal with the sensitive data if the DB owners do not know what is sensitive? What is your approach to the data obfuscation?"

"Sensitive until proven innocuous" is my mantra. Unless someone makes a case for not adequately protecting any data from visibility (either internal or external) then my default mode is to protect it.

Cases come up later on where we'll open data up for perfromance, reporting, etc reasons, but a documented business case with the appropriate signatures is required.

GEOCHET
  • 21,119
  • 15
  • 74
  • 98
Adam Davis
  • 91,931
  • 60
  • 264
  • 330
2

For most of these, while the tools don't apply the general processes of code development do:

  • Maintain a development system separate from production with enough data to get useful performance metrics when testing a new model
  • This system has unit tests (SQL queries, commits, aborted atomic commits, etc) written and run against it prior to every release.
  • There are official 'releases'
  • The development database is the source control system itself - in other words the database is modeled and held in the database with sign-ins and rollbacks, etc. It's non-trivial, and doesn't solve every problem, but given the lack of good VCS for databases it works.
  • Roll-outs (after testing, integration, etc) consist of just the new database structure going to the production site - the modeling tables are not replicated there.
GEOCHET
  • 21,119
  • 15
  • 74
  • 98
Adam Davis
  • 91,931
  • 60
  • 264
  • 330