Using Git to track mysql schema - some questions

Question

If this is recommended ?

Can I ask some git command examples about how to track versions of mysql schema?

Should we use another repository other then the one we normally use on our application root ?

Should I use something called hook ?

Update:

1) We navigate onto our project root where .git database resides.

2) We create a sub folder called hooks.

3) We put something like this inside a file called db-commit:

   #!/bin/sh
   mysqldump -u DBUSER -pDBPASSWORD  DATABASE --no-data=true> SQLVersionControl/vc.sql
   git add SQLVersionControl/vc.sql
   exit 0

Now we can:

4) git commit -m

This commit will include a mysql schema dump that has been run just before the commit.

The source of the above is here: http://edmondscommerce.github.io/git/using-git-to-track-db-schema-changes-with-git-hook.html

If this is an acceptable way of doing it, can I please ask someone with patience to comment line by line and with as much detail as possible, what is happening here:

#!/bin/sh
mysqldump -u DBUSER -pDBPASSWORD  DATABASE --no-data=true> SQLVersionControl/vc.sql
git add SQLVersionControl/vc.sql
exit 0

Thanks a lot.

Matt K · Accepted Answer · 2011-04-01T22:20:23.633

24

Assuming you have a git repo already, do the following in a shell script or whatever:

#!/bin/bash -e
# -e means exit if any command fails
DBHOST=dbhost.yourdomain.com
DBUSER=dbuser
DBPASS=dbpass # do this in a more secure fashion
DBNAME=dbname
GITREPO=/path/to/git/repo
cd $GITREPO
mysqldump -h $DBHOST -u $DBUSER -p$DBPASS -d $DBNAME > $GITREPO/schema.sql # the -d flag means "no data"
git add schema.sql
git commit -m "$DBNAME schema version $(`date`)"
git push # assuming you have a remote to push to

Then start this script on a daily basis from a cron job or what have you.

EDIT: By placing a script in $gitdir/hooks/pre-commit (the name is important), the script will be executed before every commit. This way the state of the DB schema is captured for each commit, which makes sense. If you automatically run this sql script every time you commit, you will blow away your database, which does not make sense.

#!/bin/sh

This line specifies that it's a shell script.

mysqldump -u DBUSER -pDBPASSWORD  DATABASE --no-data=true> SQLVersionControl/vc.sql

This is the same as in my answer above; taking the DDL only from the database and storing it in a file.

git add SQLVersionControl/vc.sql

This adds the SQL file to every commit made to your repository.

exit 0

This exits the script with success. This is possibly dangerous. If mysqldump or git add fails, you may blow away something you wanted to keep.

edited Apr 01 '11 at 22:20

answered Apr 01 '11 at 21:18

Matt K

13,370
2
32
51

This is basically Chris's answer with an actual mysqldump command, so toss him an upvote too if you accept this. – Matt K Apr 01 '11 at 21:19
@mkb - I'm using Capistrano for deployment. What does it do? It deploys my last committed changes. So, I just need to add the schema to the repository. According do this, and looking to my update question, perhaps your answer could change a little. Is a hook a shell script? Thanks a lot, and I'm really sorry for all this questions. :s – MEM Apr 01 '11 at 21:28
@mkb: Thanks a lot. At the end, you tell and I thank you for that, that the script that I intended to use is possibly dangerous. Is there a way to append a -e flag the same way you did on your example, perhaps that could help? I found hook usage a nice approach because it all automatically be done before (thanks, again) I do a commit, so it fits well on the workflow. Or, should I take your first code, and make it a Capistrano task somehow? (I really have no clue about this last hypotheses, anyway, but at least I would know were to keep looking. Thank you so much for your patience. – MEM Apr 02 '11 at 00:17
If you change the first line to `#/bin/bash -e` maybe. I've never used Capistrano either. – Matt K Apr 02 '11 at 01:04
I will now search the difference between bash and sh - at the end I hope to have more specific doubts after trying this. Thanks a lot, really really for your help. Cheers. :) – MEM Apr 02 '11 at 01:21
No. Please don't take me wrong, I do up vote, something useful when I understand. I mean, unfortunately that answer would only be useful if I could understand what it does. And I didn't understand the CREATE TABLE... part. I still don't understand, how will it know that each time a create table is done, it should save? Chris didn't reply to it. Anyway, if anyone could explain is code, I don't mind voting it up, I just can't lie and mark something to "useful" when it wasn't to me. – MEM Apr 02 '11 at 11:42
Auto Increment counter still included, so even if there is no change in the structure, it still different. Is there any solution to remove Auto Increment counter? – qsoft Apr 03 '14 at 07:31
1

@qsoft http://stackoverflow.com/questions/15656463/mysqldump-export-structure-only-without-autoincrement – Matt K Apr 03 '14 at 11:29
Along with skipping auto-increment, I also skip the '--Dump completed on DATE TIME' timestamp at the bottom of each mysqldump with the [--skip-dump-date](http://dev.mysql.com/doc/refman/5.5/en/mysqldump.html#option_mysqldump_dump-date) option. – georgiecasey Dec 13 '14 at 21:47

score 10 · Answer 2 · answered Apr 01 '11 at 20:25

10

If you're just tracking the schema, put all of the CREATE statements into one .sql file, and add the file to git.

$> mkdir myschema && cd myschema
$> git init
$> echo "CREATE TABLE ..." > schema.sql
$> git add schema.sql
$> git commit -m "Initial import"

answered Apr 01 '11 at 20:25

Chris Eberle

47,994
12
82
119

I see... the trick is to put all on a file added to the repository and, each time it's changed git will know. But I'm not understanding what "put the create statements into one .sql file" means. Care to explain a little bit more what does that mean. - Let's say I change the schema locally, how with that reflect my database changes? - Yes I'm a little lost here... and I would love the track also the data, but from what I've read, it has some space issues. – MEM Apr 01 '11 at 20:36
git doesn't hold deltas. It stores full copies of each revision, so storing the data of a database in git is going to get big real fast. As far storing the schema, you could use mysqldump with an argument to only dump the schema, and place the result into a git repository, then auto-commit. – Matt K Apr 01 '11 at 20:40
@mkb: can you please describe (on an answer) that workflow with some command line examples please? I'm new into all this... – MEM Apr 01 '11 at 20:43

score 3 · Answer 3 · answered Mar 21 '12 at 22:52

IMO the best approach is described here: http://viget.com/extend/backup-your-database-in-git. For your convenience I repeat the most important pieces here.

The trick is to use mysqldump --skip-extended-insert, which creates dumps that can be better tracked/diffed by git.

There are also some hints regarding the best repository configuration in order to reduce disk size. Copied from here:

core.compression = 9 : Flag for gzip to specify the compression level for blobs and packs. Level 1 is fast with larger file sizes, level 9 takes more time but results in better compression.
repack.usedeltabaseoffset = true : Defaults to false for compatibility reasons, but is supported with Git >=1.4.4.
pack.windowMemory = 100m : (Re)packing objects may consume lots of memory. To prevent all your resources go down the drain it's useful to put some limits on that. There is also pack.deltaCacheSize.
pack.window = 15 : Defaults to 10. With a higher value, Git tries harder to find similar blobs.
gc.auto = 1000 : Defaults to 6700. As indicated in the article it is recommended to run git gc every once in a while. Personally I run git gc --auto everyday, so only pack things when there's enough garbage. git gc --auto normally only triggers the packing mechanism when there are 6700 loose objects around. This flag lowers this amount.
gc.autopacklimit = 10: Defaults to 50. Every time you run git gc, a new pack is generated of the loose objects. Over time you get too many packs which waste space. It is a good idea to combine all packs once in a while into a single pack, so all objects can be combined and deltified. By default git gc does this when there are 50 packs around. But for this situation a lower number may be better.

Old versions can be pruned via:

git rebase --onto master~8 master~7

(copied from here)

Johnny Utahh · Answer 4 · 2011-12-13T13:52:01.243

The following includes a git pre-commit hook to capture mysql database/schema, given user='myuser', password='mypassword', database_name='dbase1'. Properly bubbles errors up to the git system (the exit 0's in other answers could be dangerous and may not handle error scenarios properly). Optionally, can add a database import to a post-checkout hook (when capturing all the data, not just schema), but take care given your database size. Details in bash-script comments below.

pre-commit hook:

#!/bin/bash

# exit upon error
set -e
# another way to set "exit upon error", for readability
set -o errexit

mysqldump -umyuser -pmypassword dbase1 --no-data=true > dbase1.sql

# Uncomment following line to dump all data with schema,
# useful when used in tandem for the post-checkout hook below.
# WARNING: can greatly expand your git repo when employing for
# large databases, so carefully evaluate before employing this method.
# mysqldump -umyuser -pmypassword dbase1 > dbase1.sql

git add dbase1.sql

(optional) post-checkout hook:

#!/bin/bash
# mysqldump (above) is presumably run without '--no-data=true' parameter.
set -e
mysql -umyuser -pmypassword dbase1 < dbase1.sql

Versions of apps, OS I'm running:

root@node1 Dec 12 22:35:14 /var/www# mysql --version
mysql  Ver 14.14 Distrib 5.1.54, for debian-linux-gnu (x86_64) using readline 6.2
root@node1 Dec 12 22:35:19 /var/www# git --version
git version 1.7.4.1
root@node1 Dec 12 22:35:22 /var/www# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 11.04
Release:        11.04
Codename:       natty
root@node1 Dec 12 22:35:28 /var/www#

score 1 · Answer 5 · answered Apr 01 '11 at 20:37

While I am not using Git, I have used source control for over 15 years. A best practice to adhere to when deciding where and how to store your src and accompanying resources in Source Control: If the DB Schema is used within the project then you should be versioning the schema and all other project resources in "that" project. If you develop a set of schemas or programming resources that you resuse in other projects then you should have a seperate repository for those reusable resources. That seperate Reusable resources project will be versioned on it's own and will track the versions of the actual reusable resources in that repository.

If you use a versioned resource out of the reusable repository in a different project then you have the following scenario, (just an example). Project XYZ version 1.0 is now using DB Schema_ABC version 4.0 In this case you will understand that you have used a specific version of a reusable resource and since it is versioned you will be able to track its use throughout your project. If you get a bug report on DBSchema_ABC, you will be able to fix the schema and re-version as well as understand where else DBSchem_ABC is used and where you may have to make some changes. From there you will also understand which projects contain wich versions of which reusable resources... You just have to understand how to track your resources.

Adopting this type of development Environment and Resource Management strategy is key to releasing usable software and managing a break/fix enhancement environment. Even if you're developing for your own edificcation on your own time, you should be using source control.. as you are..

As for Git, I would find a gui front end or a dev env integration if I can. Git is pretty big so I am sure it has plenty of front end support, maybe?

I must do that, when I will be more familiar with version control systems. I have no more then 1 day using it (despite having read it for almost a month now). :) I've done git commit, git add . and git push. No more then this. :) I need time. But thanks for your reply, it will surely help me out along the way. For now, however, I just need some sort of commands to place on the command line, and make something decent to happen, better then using phpmyadmin and download versions to desktop. :) — MEM, Apr 01 '11 at 20:53
I understand, but not fully understanding the commands you're issuing will most likely result in a crossed repository. It can be fairly easy to screw up to the point of no return. — apesa, Apr 01 '11 at 21:24

Tuncay Göncüoğlu · Answer 6 · 2012-05-03T13:24:30.303

As brilliant as it sounds (the idea did occur to me as well), when I tried to implement it, I hit a wall. In theory, by using the --skip-extended-insert flag, despite initial dump would be big, the diffs between daily dumps should be minimal, hence the size increase over time of the repository could be assumed to be minimal as well, right? Wrong!

Git stores shapshots, not diffs, which means on each commit, it will take the entire dump file, not just the diff. Moreover, since the dump with --skip-extended-instert will use all field names on every single insert line, it will be huge compared to a dump done without --skip-extended-instert. This results in an explosion in size, the exact opposite what one would expect.

In my case, with a ~300MB sql dump, the repository went to gigabytes in days. So, what did I do? I first tried the same thing, only remove --skip-extended-instert, so that dumps will be smaller, and snapshots would be proportionally smaller as well. This approach held for a while, but in time it became unusable as well.

Still, the diff usage with --skip-extended-insert actually still seemed like a good idea, only, now I try to use subversion instead of git. I know, compared to git, svn is ancient history, yet it seems to work better, since it actually does use diffs instead of snapshots.

So in short, I believe best solution is doing the above, but with subversion instead of git.

Having implemented and used this approach for a while now, I must say it is definitely much better. As opposed to 300MB daily increases in git repository, I now get like 7MB increases, as a result "svn update" works a lot faster. I suppose a "manual diff, and keep only diffs in a git repository" approach could have worked as such, but it works around the scm's diff capabilities, hence, takes much of the scm out of the loop, thereby defeating the purpose of using a scm in the first place. — Tuncay Göncüoğlu, May 10 '12 at 13:06
Most of the time you'd only want to dump the structure and not the actual data. — Rick Kukiela, Mar 02 '13 at 23:32
Your assumption that git stores blobs not diffs is somewhat correct. It's model is based on that, but in actuallity it stores diffs as well, just not by default and from start. More reading here https://git-scm.com/book/en/v2/Git-Internals-Packfiles — Tomáš Fejfar, Nov 23 '21 at 07:34

score 0 · Answer 7 · answered Aug 02 '14 at 03:41

(shameless plug)

The dbvc commandline tool allows you to manage your database schema updates in your repository.

It creates and uses a table _dbvc in the database which holds a list of the updates that are run. You can easily run the updates that haven't been apply to your database schema yet.

The tool uses git to determine the correct order of executing the updates.

DBVC usage

Show a list of commands

dbvc help

Show help on a specific command

dbvc help init

Initialise DBVC for an existing database.

dbvc init

Create a database dump. This is used to create the DB on a new environment.

mysqldump foobar > dev/schema.php

Create the DB using the schema.

dbvc create

Add an update file. These are used to update the DB on other environments.

echo 'ALTER TABLE `foo` ADD COLUMN `status` BOOL DEFAULT 1;' > dev/updates/add-status-to-foo.sql

Mark an update as already run.

dbvc mark add-status-to-foo

Show a list of updates that need to be run.

dbvc status

Show all updates with their status.

dbvc status --all

Update the database.

dbvc update

score 0 · Answer 8 · answered Jan 14 '18 at 14:44

I have found the following options to be mandatory for a version control / git-compatible mysqldump.

mysqldump --skip-opt --skip-comments |sed -e 's/DEFINER[ ]*=[ ]*[^*]*\*/\*/'

(and maybe --no-data)

--skip-opt is very useful, it takes away all of --add-drop-table --add-locks --create-options --disable-keys --extended-insert --lock-tables --quick --set-charset. The DEFINER sed is necessary when the database contains triggers.

Using Git to track mysql schema - some questions

8 Answers8

DBVC usage

Linked