How can I set up Git branch for automatic backups

Question

I realize there are already questions on how to back up a repository (and the answer is usually git bundle), but I'd aiming specific setup.

I'd trying to periodically commit a "snapshot" of the current branch to another branch for backup purposes. I'm using a batch file, and it looks like this (I've tried more variations than I can list here):

git stash
git checkout backup
git add .
git commit -m "Automatic Backup  %Time%"  
git push --all origin
git stash pop
git checkout -

The behavior I'm aiming for is that each commit should be an exact snapshot of the current state of the directory, regardless of what branch I have checked out when I run it. If a newer commit changes a file, I want it to take precedence if a merge conflict arises.

I'm not actually trying to merge branches, I'm just trying to get the files as they exist at the time of the snapshot on the disk. So if branch A has a change on file A, and branch B has a conflicting change on the same file, but branch B is checked out, the backup branch should end up getting branch B's changes, which are currently in the file on the disk, ignoring anything else.

And a practical(?) example of what I'm trying to do: Say Branch A has "myfile.txt" as "Hello World", and Branch B has "myfile.txt" as "Hello Dave".

When I checkout Branch A and open "myfile.txt" in a text editor I expect it'd have "Hello World". When I checkout B, I'd expect it to have "Hello Dave".

If within one branch commit 1 had "Hello World", and 2 had "Hello Dave" there wouldn't be a conflict. I want my back up branch to end up with commit 1 containing "Hello World", and commit 2 containing "Hello Dave" assuming commit 1 occurred while I had previously checked out branch A, and branch B when commit 2 occurred.

I believe git stash is the key to what I'm doing, but it simply isn't working. I tried several different combinations of those commands, and all of them returned different variations off errors, at different points while the repo was in various states, so it's really hard to summarize them. I'd say my approach is probably fundamentally wrong, so the commands listed are there just to give a picture of what I've tried so far.

But no matter what I do I either get merge conflicts or nothing gets committed. What am I missing? (If there's any additional information I can provide, please let me know)

does it have to be in git or can it be an archive like with git archive or maybe even incremental rsync backups? — Trudbert, Aug 21 '14 at 04:14
Only way I could think of is use seperate branches each backup because like Lionel said if you backed up branch a last time and now try to commit branch b to the same branch and a and b have conflict there will be conflict. — Trudbert, Aug 21 '14 at 04:44
How come? Say Branch A has "myfile.txt" as "Hello World", and Branch B has "myfile.txt" as "Hello Dave". When I checkout Branch A and open "myfile.txt" in a text editor I expect it'd have "Hello World". When I checkout B, I'd expect it to have "Hello Dave". If within one branch commit 1 had "Hello World", and 2 had "Hello Dave" there wouldn't be a conflict would there? — Selali Adobor, Aug 21 '14 at 04:47
Yep you are right understood the question wrong. Have you run the commands one by one to see where it goes wrong? — Trudbert, Aug 21 '14 at 04:52
I have, but as I said, I tried several different combinations of those commands, and all of them returned different variations off errors, at different points while the repo was in various states, so it's really hard to summarize them. I'd say my approach is probably fundamentally wrong, so the commands listed are there just to give a picture of what I've tried so far. — Selali Adobor, Aug 21 '14 at 04:58

Mikko Rantalainen · Accepted Answer · 2014-08-27T08:01:43.027

4

It seems to me that you want to automatically backup the working directory contents as a new commit in the backup branch and the push all the branches to origin.

In that case, you do not want to checkout the backup branch because that would change your working directory contents. Instead, you'll want to use some combination of git read-tree, git update-index and git commit-tree to fabricate a new commit for your backup branch and then use git branch -f to add the newly fabricated commit to the backup branch. After that, you can continue to do the regular push to origin.

Links: http://alx.github.io/gitbook/7_raw_git.html

Update:

I think the easiest way to do this is like this (assuming you already have a branch called backup):

#!/bin/bash
BRANCH=backup
export GIT_INDEX_FILE=/tmp/git-backup-index.$$
git add .
git commit-tree $(git write-tree) -p $(git show-ref --hash --heads $BRANCH) -m "Automatic backup" | xargs git branch -f $BRANCH
rm -f "$GIT_INDEX_FILE"

(Note that I haven't tested the above script...)

edited Aug 27 '14 at 08:01

answered Aug 27 '14 at 07:06

Mikko Rantalainen

14,132
10
74
112

1

I had all but given up hope this could be done,but the script worked perfectly! I hadn't given thought to the fact that checking out the `backup` branch would be changing my working directory for some reason, and I was unaware of the usage of commands for manipulating trees, your link was very helpful, thanks. – Selali Adobor Aug 27 '14 at 13:28
Additional thing that you might want to consider: the `git add .` in above script will not add files listed in `.gitignore`. If you truly want all files, use `git add --force --all .` instead. Note that doing that may inflate your repository size pretty fast if your working directory contains lots of generated files (e.g. `*.o` and `*.so` files in case of C/C++ programming). – Mikko Rantalainen Jul 12 '22 at 07:26

score 2 · Answer 2 · answered Aug 24 '14 at 04:47

Why you would ever need to do this is beyond me.

You can use git config core.logAllRefUpdates true. This will make git keep a perfect history of everywhere every branch has been. You can even tweak it to never expire and you will have a perfect history of every branch (and not just when you remember to back it up).

If you really want to do it you can just make the index contain the tree you want and then commit it.

Switch to backup branch git stash; git checkout backup
Clear out the index git rm -fr --cached .
Copy into the index the tree of the commit you want to backup
git ls-tree branchA | git update-index --index-info
Commit the results git commit -m "Backup of branchA" (Don't use -a!)
Go back to where you came from git checkout -f branchB; git stash pop. You need to use -f because the working copy will say it is dirty.

This will never have any merge conflicts, because you are not merging anything. Each commit in backup will exactly represent the branches that you backed up at the time.

Here is an example.

$ mkdir example
$ cd example
$ git init
Initialized empty Git repository in /tmp/example/.git/
$ echo "First change" > file1
$ git add file1
$ git commit -am "file1 first change"
[master (root-commit) f071d01] file1 first change
 1 file changed, 1 insertion(+)
 create mode 100644 file1
$ git branch master_two
$ git checkout master_two
Switched to branch 'master_two'
$ sed -i s/First/Two/g file1·
$ git commit -am "Two change"
[master_two b786d88] Two change
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git checkout master
Switched to branch 'master'
$ sed -i s/First/Second/g file1
$ git commit -am "Second change"
[master d88ca84] Second change
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git checkout --orphan backup
Switched to a new branch 'backup'
$ git rm -fr --cached .
rm 'file1'
$ git ls-tree master | git update-index --index-info
$ git commit -am "snapshot of master"
[backup (root-commit) 7af271d] snapshot of master
 1 file changed, 1 insertion(+)
 create mode 100644 file1
$ git rm -fr --cached .
rm 'file1'
$ git ls-tree master_two  | git update-index --index-info
$ git commit -m "snapshot of master_two"
[backup a3ddfdd] snapshot of master_two
 1 file changed, 1 insertion(+), 1 deletion(-)
$ git checkout -f master
Switched to branch 'master'

This answer gave me another thing to look into, (`git checkout --orphan`), but it's doing something different than what I was asking. It's taking snapshots of a specific branch within the backup branch, which is somewhat the opposite of what I had in mind (ignoring the existence other branches) — Selali Adobor, Aug 24 '14 at 08:21
Can you add more information to your question then? I went and read it again and I still don't know what you are looking for. Perhaps drawing a tree of the commits and branches would help. — onionjake, Aug 25 '14 at 16:39
I'm not sure what to add. Also could you add an explanation of `git checkout --orphan` in the context of the answer? — Selali Adobor, Aug 26 '14 at 00:54
Having come here with a search, and seeing a good answer, I thought I'd answer the "why", from my point of view -- an automatic backup can make it less error prone to move between two (or more) machines and have the same working file -- without resorting to having your git checked out into something like a dropbox sync'd directory. The environment syncing inside of git, though, is a bit meta... it just saves an additional tool. — lilbyrdie, Aug 02 '15 at 18:12

score -1 · Answer 3 · answered Aug 21 '14 at 04:24

The problem is that you're trying to get git to pretend all your branches can coexist in a single branch (that you call backup). So in effect, you are merging everything, and it's natural that you get conflicts!

Just to clarify, here is an example: You have branch a, with file f that contains the text "a", and branch b that has file f containing "b". You work on branch a, automatic backup occurs, and it store f="a" in the backup branch. Then you work on branch b for whatever reason, and automatic backup occurs, and now you are trying to store f="b". This is a merge conflict because there is no relationship between branch a and b (maybe they both inherited file f from the prod branch where file f contains "prod", and changed it to "a" and "b" respectively).

If you want to take snapshots of what's in your directory at a given time, you want to use tags, and you probably want to put the time the tag was taken as part of the tag name such as workdir-snapshot-20140818-1432.

Remember that branches are only pointers to commits, so using the term "backup" is really misleading, you are really not backuping data, the most you are keeping track of is what branch was checked out to your workdir at a given time, that's it.

I'm can clarify this in question if needed, but I'm not actually trying to merge branches, I'm just trying to get the files as they exist at the time of the snapshot on the disk. So if branch A has a change on file A, and branch B has a conflicting change on the same file, but branch B is checked out, the backup branch should end up getting branch B's changes, which are currently in the file on the disk, ignoring anything else. — Selali Adobor, Aug 21 '14 at 04:39

How can I set up Git branch for automatic backups

3 Answers3

Linked