I've used git for awhile for source control and I really like it. So I started investigating using git to store lots of large binary files, which I'm finding just isn't git's cup of tea. So how about large text files? It seems like git should handle those just fine, but I'm having problems with that too.
I'm testing this out using a 550mb size mbox style text file. I git init'ed a new repo to do this. Here are my results:
- git add and git commit - total repo size is 306mb - repo contains one object that is 306mb in size
- add one email to the mailbox file and git commit - total repo size is 611mb - repo contains two objects that are each 306mb in size
- add one more email to the mailbox file and git commit - total repo size is 917mb - repo contains three objects that are each 306mb in size
So every commit adds a new copy of the mailbox file to the repo. Now I want to try to get the size of the repo down to something manageable. Here are my results:
- git repack -adf - total repo size is 877mb - repo contains one pack file that is 876mb in size
- git gc --aggressive - total repo size is 877mb - repo contains one pack file that is 876mb in size
I would expect to be able to get the repo down in size to something around 306mb, but I can't figure out how. Anything larger seems like a lot of duplicate data is being stored.
My hope is that the repo would only increase by the size of the new email received, not by the size of the entire mailbox. I'm not trying to version control email here, but this seems to be my big hold back from using a nightly script to incrementally back up users' home directories.
Any advice in how to keep the repo size from blowing up when inserting a small amount of text to the end of a very large text file?
I've looked at bup and git annex, but I'd really like to stick with just plain old git if possible.
Thank you for your help!