7

I want to use Git to track micro changes (several a day) on a large working directory (many gigs). The data will be mixed binary/plain text. Binary data won't change nearly as much as the text information. Access to old commits will rarely be needed, and can be slow, whereas recent history needs to be fast.

I don't want to lose old data permanently, just move it to a backup server or something. Is there something in Git that allows old history to be archived and keep only a certain subset in the local repository?

If not, is there a tool that is more suited for this purpose? I like Git because I know it and I want the version control and diffs. I won't be needing any of the advanced features of Git (like branching/merging, not distributed), so any other similar VCS would be nice.

beatgammit
  • 19,817
  • 19
  • 86
  • 129
  • Several changes a day to mostly text files hardly counts as "micro", that's about the commit rate any old project gets during phases heavier on fixing bugs rather than new development. Git *should* be able to handle this without much bloat. Is your repository actually growing in size that quickly that you need to dump part of your history off-site? – millimoose Aug 13 '12 at 11:42
  • Except that the total data size could be > 100GB. Even with clever compression, even minor changes on the scale of 10,000+ files several times a day will add significantly to the repository cache locally. I plan on automatically rebasing, but there's a limit to that usefulness as well. It's not really what Git was designed for, so I was wondering if it already supported that type of thing before I break out the chisel and reinvent the wheel. – beatgammit Aug 13 '12 at 22:03

1 Answers1

4

If you're patching with git format-patch, then create a shallow clone with git clone --depth <depth> and procede. Odds are you're not, though, in which case you'll probably find this answer and this answer useful. The second concludes that git checkout --orphan is perhaps the best way to get what you want. Of course, you'll still need to clone the complete history locally once to make a smaller branch of it.

If you're feeling adventurous, want this badly, and are willing to put up with a more complicated push process, creating patches with git format-patch and applying them to another repository with git am is neither difficult to execute nor to script. It would add an extra layer to your push process -- e.g. create a patch on shallow repo, apply programmatically to a full repo, which is either local or somewhere else, push from the latter. The time and trouble probably isn't worth it, but it certainly is possible.

Community
  • 1
  • 1
Christopher
  • 42,720
  • 11
  • 81
  • 99
  • Hmm. I'll have to look at that. I don't need a nice UI, as long as I can programmatically get the change-sets and update the other repo. I plan on having several copies of the repo that this would happen on, so they would need to be synchronized, but that just adds to the fun. I would prefer a way to slice the commit history at a point in time though... – beatgammit Aug 13 '12 at 22:06
  • To clarify, slicing history is easy. Pushing and pulling that sliced history is difficult. Git can't make assumptions about where the HEAD of repositories are, nor [can it interpret which slice your shallow repository took.](http://stackoverflow.com/a/6900428/877115) If you're willing to script this, the problem is more or less solved. Pass `git format-patch` on one shallow repository to the others. Apply with `git am`. You could very easily script this, particularly if you're not generating, say, merge commits along the way. – Christopher Aug 14 '12 at 01:06
  • It looks like I can merge back in with my particular use case. I'll accept this once I read up on shallow clones more. Thanks! – beatgammit Aug 14 '12 at 14:26