15

I've started using git with a small dev team of people who come and go on different projects; it was working well enough until we started working with Wordpress. Because Wordpress stores a lot of configurations in MySQL, we decided we needed to include that in our commits.

This worked well enough (using msyql dump on pre-commits, and pushing the dumped file into mysql on post-checkout) until two people made modifications to plugins and committed, then everything broke again.

I've looked at every solution I could find, and thought Liquibase was the closest option, but wouldn't work for us. It requires you to specify schema in XML, which isn't really possible because we are using plugins which insert data/tables/modifications automatically into the DB.
I plan on putting a bounty on it in a few days to see if anyone has the "goldilocks solution" of:

The question:
Is there a way to version control a MySQL database semantically (not using diffs EDIT: meaning that it doesn't just take the two versions and diff it, but instead records the actual queries run in sequence to get from the old version to the current one) without the requirement of a developer written schema file, and one that can be merged using git.

I know I can't be the only one with such a problem, but hopefully there is somebody with a solution?

Jess
  • 8,628
  • 6
  • 49
  • 67
  • 2
    I don't know why I have not asked this question before! – Gabriel Santos Jul 17 '12 at 04:09
  • 3
    I believe the *only "correct way"* to version a database is with diffs *and* snapshots. They are both valuable and part of a whole. I have written about it here: http://stackoverflow.com/questions/11461707/database-version-control-for-mysql/11461909#11461909 where I explain my reasoning. –  Jul 17 '12 at 04:10
  • So, my advice to reduce this this issue: "keep the schema-change branch (or tag?) clean" and "be careful when merging schema updates" ;-) –  Jul 17 '12 at 04:32
  • I definitely agree with you @pst, but this just isn't possible in Wordpress, where it's very difficult to track schema changes plugins make. Though this is definitely a solution for custom coded databases, where I actually want to use liquibase. – Jess Jul 17 '12 at 04:44
  • @mazzzzz Can keep schema changes in a separate/dedicated branch and then only merge into a "dev" or "target" branch on non-conflict? In Hg we use a named branch dedicated to this, sort of like a mutex for the schema, in a way. (I think the Git model is to use tags? But I don't Git ..) In any case, let the schema be an exposed separate entity in the version control. –  Jul 17 '12 at 04:57

1 Answers1

4

The proper way to handle db versioning is through a version script which is additive-only. Due to this nature, it will conflict all the time as each branch will be appending to the same file. You want that. It makes the developers realize how each others' changes affect the persistence of their data. Rerere will ensure you only resolve a conflict once though. (see my blog post that touches on rerere sharing: http://dymitruk.com/blog/2012/02/05/branch-per-feature/)

Keep wrapping each change within a if then clause that checks the version number, changes the schema or modifies lookup data or something else, then increments the version number. You just keep doing this for each change.

in psuedo code, here is an example.

if version table doesn't exist
  create version table with 1 column called "version"
  insert the a row with the value 0 for version
end if
-- now someone adds a feature that adds a members table
if version in version table is 0
  create table members with columns id, userid, passwordhash, salt
    with non-clustered index on the userid and pk on id
  update version to 1
end if
-- now some one adds a customers table
if version in version table is 1
  create table customers with columns id, fullname, address, phone
    with non-clustered index on fullname and phone and pk on id
  update version to 2
end if
-- and so on

The benefit of this is that you can automatically run this script after a successful build of your test project if you're using a static language - it will always roll you up to the latest. All acceptance tests should pass if you just updated to the latest version.

The question is, how do you work on 2 different branches at the same time? What I have done in the past is just spun up a new instance that's delimited in the db name by the branch name. Your config file is cleaned (see git smudge/clean) to set the connection string to point to the new or existing instance for that branch.

If you're using an ORM, you can automate this script generation as, for example, nhibernate will allow you to export the graph changes that are not reflected in the db schema yet as a sql script. So if you added a mapping for the customer class, NHibernate will allow you to generate the table creation script. You just script the addition of the if-then wrapper and you're automated on the feature branch.

The integration branch and the release candidate branch have some special requirements that will require wiping and recreating the db if you are resetting those branches. That's easy to do in a hook by ensuring that the new revision git branch --contains the old revision. If not, wipe and regenerate.

I hope that's clear. This has worked well in the past and requires the ability for each developer to create and destroy their own instances of dbs on their machines, although could work on a central one with additional instance naming convention.

Adam Dymitruk
  • 124,556
  • 26
  • 146
  • 141
  • 2
    This is similar to what I do. Although I 1) generate different update scripts following a simple sequence_name.sql pattern (which has advantages of file-level organization -- I am over 350k of SQL DDL! -- and a disadvantage of needing to look at conflicts differently) and; 2) I use an automated tool to apply the deltas and manage the versioning (RoundhousE in my case) which makes life easy and; 3) I *still* version a snapshot. I think one critical idea is keeping a "clean schema branch" separate from the other development work. –  Jul 17 '12 at 19:55
  • On re-reading, "integration branch and the release candidate branch have some special requirements that will require wiping and recreating the db" sounds ... suspicious. Here only mainline (or "rc" branches) push up through integration/deployment, and in those cases the db deletion (if required, but should *never be* for deployment) is done as part of process -- not just being part of a branch. Half of making sure that the schema stuff works is making sure the updates work **with** existing data. It's easy to add a new table. It's much harder to update a set of table relationships .. –  Jul 17 '12 at 20:36
  • .. perhaps it's just a flaw in my thinking that a database schema is a *living model* ;-) –  Jul 17 '12 at 20:38
  • release candidate and integration branches are not used for deployment. You're asking for trouble if you keep db changes on a separate branch to the feature code. No matter what your layers are doing features tightly couple the changes across all layers. Don't treat them as separate branches - db or otherwise. – Adam Dymitruk Jul 17 '12 at 22:29
  • Hmm, perhaps it's just a different way in which I use Hg for this purpose; I use a separate named branch (however, because a "named branch" in Hg is just a name permanent fixed in Hg this isn't really a "separate branch" as it would be in SVN; not sure how Git would map this). DB schema/model changes are done on the named branch (a "force branch" to switch to it) and then it is then merged in (or "force branch") back into the branch for the feature-code utilizing the new model. This can look "odd", in the graph, e.g "featureX->model->featureX", even though it's all part of the *same tree*. –  Jul 18 '12 at 00:25
  • Then it's just a matter of resolving conflicts in the "model branch" (but a branch is a branch, name or otherwise, so it's not a separate entity!) to keep only a single head there .. (heads are shown readily per named-branch in Hg). That is, in our setup the model branch eclipses many features while being *part* of different feature branches .. –  Jul 18 '12 at 00:28
  • The difference here is that git has light weight branches (they are simply pointers to commits in the history tree) where as Hg has some other issues to consider with operation such as reset which the RC and dev branches require. – Adam Dymitruk Jul 18 '12 at 07:37
  • Nice idea. You can add version control hooks to automate that this script get runs when the code is updated. – Jonathan Jun 24 '14 at 23:18