44

I am wondering if there is an easy way, ie like a simple cron job, to regularly pull from a remote git repository to a local read only mirror for backup purposes?

Ideally it would pull all branches and tags, but the master/trunk/head would be sufficient.

I just need a way to make sure that if the master git server dies, we have a backup location that we could manually fail over to.

corydoras
  • 7,130
  • 12
  • 55
  • 58
  • What's wrong with the bash script idea? You can modify it to do the probe/pull just once, then install it as a cron job, no? – Santa May 03 '10 at 07:50
  • That script will be fine. You do realise that every clone of a git repo is a complete backup? So you most likely have a lot of copies of the repo already. – Andrew McGregor May 03 '10 at 08:23
  • @Andrew, good point. In our situation though we have examples where there are no clones anywhere, ie we have some code that is updated by a client via FTP. We use git to keep track of what the client is doing and there is no clone of it anywhere. – corydoras May 04 '10 at 07:31
  • 9
    Just for the record, a git clone is *not* a complete backup. It doesn't include your repository configuration in .git/config nor things like reflogs, hooks, git-rerere's cache or unreferenced commits (which may also be valuable). And presumably many other things in .git/. For backing up a server repository, a clone may e enough, but a working repository has a lot more to lose. – wu-lee Dec 19 '11 at 18:50

3 Answers3

64

First create a mirror with

git clone --mirror git@somewhere.com:repo.git

then setup a cron job like this:

*/1 * * * * gitbackup cd /backup/repo.git && git fetch -q --tags

This will backup the changesets every minute. Maybe you want to do this less frequently.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
gregor
  • 4,733
  • 3
  • 28
  • 43
10

As Andrew noted, every clone of a git repo is a full-fledged backup of the repo. That said, if you want something backed up automatically to a particular machine, you can create a bare repo on the backup server, push into it with all the branches you want backed up in order to initially populate it. Then just setup a post update hook on the "main" repo so that as soon as there are commits pushed in, it goes ahead and pushes them to the backup repo. No need for a cron job or rsync, and its an almost live copy.

J. Cordasco
  • 836
  • 6
  • 15
  • 1
    +1 for adding a solution that is good in general (and hence a good reference) but doesn't answer the question. We are asking how to pull as we need to pull through a NAT – corydoras May 04 '10 at 07:46
-3

do you have direct access to the server? then you could just rsync the .git directory

knittl
  • 246,190
  • 53
  • 318
  • 364
  • 1
    Depending on the platform, simply copying a git repo can lead to problems, for example due to permissions not being propagated from the source to the destination. Cloning is therefore a better solution, hence -1 for this answer. – Gareth Stockwell May 03 '10 at 08:39
  • rsync can take care of that. furthermore, i often find that pull does not pull remotes, stashes and other important information – knittl May 03 '10 at 08:57
  • i really don't see how this answer is *so bad*? maybe it's not the best solution, but it's definitely a working solution if you have direct access to the server. if you rsync directly to a tar-archive you don't even have problems with permissions … – knittl May 03 '10 at 09:49
  • i still don't get what is **wrong** with this answer? can anyone explain? – knittl May 03 '10 at 15:47
  • 1
    how about: it's just not so safe? it might work now, but that's accidental; who's to guarantee it'll work a year from now? – hasen May 03 '10 at 19:52
  • @hasen, i guess permissions models just don't change overnighht. if it works today it should work in a year too—and if the binary format changes, i'm sure we will be informed – knittl May 04 '10 at 00:47
  • for all I know, git can change its formats, allow linking to other directories that hold part of the object database, or whatever. – hasen May 04 '10 at 04:03
  • Despite the answer not really answering the question (its good to think outside the box), I am fairly sure rsync would not guarantee a working functional copy of the repository if an rsync is running in the middle of a commit to the git repository. – corydoras May 04 '10 at 07:43
  • @corydas, i think it will sync a working copy ;) heads get updated last and git's internal data storage is well designed and robust – knittl May 04 '10 at 07:46
  • 5? how do you know that, it says -2 for me – corydoras May 05 '10 at 01:38
  • Very succinctly - this solution is prone to all kinds of failure. As folks have already said, rsync doesn't always preserve or translate permissions and group ownership properly, and also there's the concern of atomicity - what if it's a big repository, and check-ins are happening while you're doing the copy? You get a corrupted dupe. Whereas if you use the git clone --mirror solution, you're working within the system and atomicity is preserved. – feoh Mar 22 '12 at 22:12
  • @feoh: you don't get a corrupt copy, even if the repository is modified inbetween. commits and pushes in git are atomic, so you don't get anything broken (you might not get the latest version, but neither do you with a plain `git clone --mirror`) – knittl Mar 29 '12 at 14:27
  • @chila: is it? I use Git to have version control. That does not mean I cannot use rsync to syncronize directories … – knittl Apr 28 '12 at 06:30
  • @knittl If you have a git version-controlled directory, you should do syncs with git itself, it is a superior way to sync. – chila Apr 30 '12 at 14:02
  • 2
    @chila: Why is it superior? Using Git to backup a repository will not copy everything, but only commits reachable from refs (no dangling commits, trees and blobs). Using rsync you get everything, which is accessible on the file-system level. Yes, using Git is likely to transmit less data and rsync will copy complete packfiles after a repack of a repository. But with rsync you will get an exact copy (backup). Anyway, I don't really care for the downvotes, I stand by my answer – knittl Apr 30 '12 at 16:02
  • @knittl Well it is superior if you don't want to backup unreachable commits, given the fact that you're using git already. If you want to backup unreachable stuff, rsync would be ideal. Edit your question so I can undo my downvote, I doesn't allow me yo do it now. – chila May 02 '12 at 14:03