3

Possible Duplicate:
Completely remove (old) git commits from history

git is very useful for nightly snapshots of client web sites. Knowing everything (php + mysqldump + user file uploads) is in a git repository provides great peace of mind.

Due to the large size of some of the sites, I am wondering if anyone is aware a moderately easy way to remove (for example) all commits older than 30 days?

Community
  • 1
  • 1
Jay
  • 19,649
  • 38
  • 121
  • 184
  • 5
    I don't think Git was really meant to be used as a backup solution, and as such doing this isn't really intended. You can probably just delete them with a rebase but I'm not sure how to do that programmatically. – Andrew Marshall Mar 19 '12 at 05:46
  • Regarding "git not being intended for a backup solution", I think we all agree this is true. (: However speaking from a purely pragmatic point of view, myself (and according to google) many other people find it to be a great fit for a backup solution – Jay Mar 19 '12 at 06:23
  • @JohnDouthat Nice spot! It is not immediately clear that they are duplicates as the "problem" is different, but the outcome is the same. – Jay Mar 19 '12 at 06:27
  • The "squash" method will indeed work. It might be pretty slow. I haven't timed it vs filter-branch (which could also be pretty slow...). In both cases you'll still have the original commits, via the reflog etc., so if the point is to recover disk space you'll still need to "take out the garbage" as it were. – torek Mar 20 '12 at 08:42

2 Answers2

4

Indeed, you actually can do this. It's a bit tricky. Here's an example...

$ cd /tmp
$ mkdir rmcommits
$ cd rmcommits
$ git init
Initialized empty Git repository in /tmp/rmcommits/.git/
$ cp /tmp/example/xy.c .
$ git add xy.c
$ git commit -m 'initial commit'
[master (root-commit) 8d5b88c] initial commit
 1 files changed, 273 insertions(+), 0 deletions(-)
 create mode 100644 xy.c
$ echo 'more stuff' > morestuff.txt
$ git add morestuff.txt; git commit -m 'add some stuff'
[master f971ae5] add some stuff
 1 files changed, 1 insertions(+), 0 deletions(-)
 create mode 100644 morestuff.txt
$ echo 'and still more' >> morestuff.txt 
$ git add morestuff.txt; git commit -m 'add more stuff'
[master bea9192] add more stuff
 1 files changed, 1 insertions(+), 0 deletions(-)

Now I pick out the place where I want "history to end" (for branch master, aka HEAD):

$ git rev-parse HEAD^
f971ae5b4225aca364223a44be8be84268385ff3

This is the last commit I will keep.

$ git filter-branch --parent-filter 'test $GIT_COMMIT == f971ae5b4225aca364223a44be8be84268385ff3 && echo "" || cat' HEAD
Rewrite bea9192a53a5aeb7532aa1e174f7f642363396de (3/3)
Ref 'refs/heads/master' was rewritten
$ git log --pretty=oneline
65a246b8320382a64550d2c4b650c942d7bfba70 add more stuff
7892ab45aa33cd5ebdc3090ce2622081059fdd79 add some stuff

(Explanation: git filter-branch basically runs over all the commits in the branch, in this case master because HEAD is currently ref: refs/heads/master, and with --parent-filter, you can rewrite the parent(s) of each commit. When we find the target commit, before which we want history to cease, we echo nothing—you don't need the empty string, that's my old habit from when echo with no arguments did nothing—otherwise we use "cat" to copy the existing -p arguments, as per the filter-branch manual. This makes the new commit, based off the one we tested for, have no parents, i.e., it's now an initial commit -- the root of the branch. This is unusual in a git repo, as we now have two root commits, one on the new master and one on the old, saved master as noted below.)

Note that the older commit tree is still in the repo in its entirety, under the saved name that git filter-branch uses:

$ git log original/refs/heads/master --pretty=oneline
bea9192a53a5aeb7532aa1e174f7f642363396de add more stuff
f971ae5b4225aca364223a44be8be84268385ff3 add some stuff
8d5b88c468f75750d5a01ab40bfae160c654ac66 initial commit

You have to delete that reference (and clean out the reflog) and do a "git gc" before the rewritten commits (and any unreferenced trees, blobs, etc) really go away:

$ git update-ref -d refs/original/refs/heads/master
$ git reflog expire --expire=now --all
$ git gc --prune=now
$ git fsck --unreachable
$

That last line shows that they're really gone.

torek
  • 448,244
  • 59
  • 642
  • 775
0
  1. A file present in the current state of the repository is saved as the original added file plus a series of changes so you can't remove the commit where the files is added.

  2. TortoiseGit has an operation where you select several sequential commits from the log and "Combine to one commit" but it's not natively offered in Git and from what I can infer from the windows that appear, is implemented as creating a new branch, applying changes from the original branch, commiting only once and rebasing on the result. It's certainly not a quick operation when many commits are selected, I imagine it will be even slower in a large repository and I always have a backup before using it.

All in all, I doubt a easy way to do this exists.

madth3
  • 7,275
  • 12
  • 50
  • 74
  • 1
    1: Actually, no, it's not. The *packs* are compressed so that you get the same space-saving as with deltas, but each file is stored in its entirety. A git "commit" object points to a git "tree" object, and the "tree" object lists "blob"s (files) and more "tree"s, all by SHA1 ID; and from the SHA1 ID, you extract the file wholesale (via the magic of those packs). 2. In native git, that's a "squash" in a `git rebase --interactive`. Yes, underneath it's done by constructing a new branch. – torek Mar 20 '12 at 08:26
  • I stand corrected. I need to deeply read the chapters I skimmed when reading about git internal structure. – madth3 Mar 22 '12 at 20:43
  • Git's compression techniques are unusual, to say the least. :-) Very effective though. Packs do delta-compression but objects themselves are merely zlib-compressed, and there are checksums throughout (better in v2 packs than v1). The compression algorithms are re-used in different ways to get delta-compression for pull and push operations, as well, but those are undone on the receiving end. – torek Mar 22 '12 at 20:47