2

Scenario: I have a directory structure which changes from time to time. I'd like to have backups of all states it once was in. For this I simply set it up as a git repository and have a cron job perform a git commit -m 'croncommit' once every day. This works fine and enables me to review any state of the directory structure in the history.

But the git repository grows, even if the directory structure doesn't. If I once had a huge file in there for a brief time, it will always stay in the repository. This is nice and correct from the git point of view of course, but since for me this is a mere backup facility, it makes sense to want to keep only the more recent states, say, for the last month.

I'm looking for a way to remove states (commits) older than a specific duration (e. g. one month) from a given repository. I think this can be done by collapsing all commits which are older than the specific age into one.

But I fail to find the correct command and syntax for this task.

How can I do this?

Alfe
  • 56,346
  • 20
  • 107
  • 159

1 Answers1

3

Use the --since option to git log to find the new start point of your history and create a new parentless commit using git commit-tree that reuses its tree state. Afterward, rebase any children onto the new root and move your branch ref to the new HEAD.

#! /usr/bin/env perl

use strict;
use warnings;

my $MAX_AGE = 30;
my $BRANCH  = "master";

# assumes linear history
my($new_start,$rebase) = `git log --reverse --since="$MAX_AGE days ago" --format=%H`;
die "$0: failed to determine new root commit"
  unless defined($new_start) && $? == 0;

chomp $new_start;

my $new_base = `echo Forget old commits | git commit-tree "$new_start^{tree}"`;
die "$0: failed to orphan $new_start" unless $? == 0;
chomp $new_base;

# don't assume multiple commits more recent than $MAX_AGE
if (defined $rebase) {
  system("git rebase --onto $new_base $new_start HEAD") == 0
    or die "$0: git rebase failed";
}

system("git branch -f $BRANCH HEAD") == 0
  or die "$0: failed to move $BRANCH";

system("git reflog expire --expire=now --all && git gc --prune=now") == 0
  or die "$0: cleanup failed";

For example:

$ git lol --name-status
* 186d2e5 (HEAD, master) C
| A     new-data
* 66b4a19 B
| D     huge-file
* 5e89273 A
  A     huge-file

$ git lol --since='30 days ago'
* 186d2e5 (HEAD, master) C
* 66b4a19 B

$ ../forget-old 
First, rewinding head to replay your work on top of it...
Applying: C
Counting objects: 5, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (5/5), done.
Total 5 (delta 1), reused 0 (delta 0)

$ git lol --name-status
* b882852 (HEAD, master) C
| A     new-data
* 63bb958 Forget old commits

Note that git lol is a nonstandard but highly useful alias equivalent to

git log --graph --decorate --pretty=oneline --abbrev-commit

ADDITION by OP: Here's a bash version of the Perl script above:

#!/bin/bash -xe

MAX_AGE=${MAX_AGE:-30}
BRANCH=${BRANCH:-master}

# assumes linear history
{
  read new_start
  read rebase
} < <(git log --reverse --since="$MAX_AGE days ago" --format=%H)
[ -n "$new_start" ]  # assertion

read new_base < <(
  echo "Forget old commits" | git commit-tree "$new_start^{tree}"
)

# don't assume multiple commits more recent than $MAX_AGE
[ -n "$rebase" ] && git rebase --onto $new_base $new_start HEAD

git branch -f "$BRANCH" HEAD

git reflog expire --expire=now --all
git gc --prune=now

git checkout "$BRANCH"  # avoid ending on "no branch"
Alfe
  • 56,346
  • 20
  • 107
  • 159
Greg Bacon
  • 134,834
  • 32
  • 188
  • 245
  • That's what I was looking for, thanks a lot. I took the liberty to rewrite your Perl script as a Bash script (and added it to your answer for completeness). I hope you don't mind :) – Alfe Mar 11 '16 at 11:11
  • One question remains: Which case do you check when you test for `$rebase` being defined? When can that be undefined (or empty in the bash version)? – Alfe Mar 11 '16 at 11:12
  • 1
    @Alfe You're welcome. Cheers! The check is for the unlikely case that a single lonely commit is on your branch that has a committer date (what `git log --since=...` uses) newer than 30 days old. – Greg Bacon Mar 11 '16 at 11:12
  • One more thing: I found that I ended up on "no branch" after your script was run, so `git status` etc. weren't helpful anymore. I added a `git checkout "$BRANCH"` to by bash version because of this. Please correct me if this is not what was intended. – Alfe Mar 11 '16 at 11:15
  • Oops. I mistakenly thought updating the branch would take you out of a detached head state. Please update the answer with your changes. – Greg Bacon Mar 11 '16 at 11:18