I needed to do a similar rewrite on an unnecessarily large repository while the repo was offline. The approach I took was trying automated 'interactive' rebase using GIT_SEQUENCE_EDITOR
which is covered in this answer by @james-foucar & @pfalcon.
For this to work well, I found it better to first remove the merges from the section of the history being rewritten. For my own case, this was done using lots of git rebase --onto
which is covered amply in other questions on StackOverflow.
I created a small script generate-similiar-commit-squashes.sh
to generate the pick
& squash
commands so that consecutive similar commits would be squashed. I used author-date-and-shortlog to match similar commits, but you only need author (my gist has a comment about how to make it match only on author).
$ generate-similiar-commit-squashes.sh > /tmp/git-rebase-todo-list
The output looks like
...
pick aaff1c556004539a54a7a33ce2fb859af0c4238c foo@example.com-2015-01-01-Update-head.html
squash aa190ea2323ece42f1cd212041bf61b94d751d5c foo@example.com-2015-01-01-Update-head.html
pick aab8c98981a8d824d2bc0d5278d59bc1a22cc7b0 foo2@example.com-2015-01-28-Update-_config.yml
The repository was also full of self-reverts with the same style 'Update xyz' commit messages. When squashed, they resulted in empty commits.
The commits I was merging had identical commit messages. git rebase -i
offers a revised commit message with all squashed commit messages appended, which would have been repetitive. To address that, I used a small perl script from this answer to remove duplicate lines from the commit message offered by git rebase
. It is better in a file, as it will be used in a shell variable.
$ echo 'print if ! $x{$_}++' > /tmp/strip-seen-lines.pl
Now for the final step:
$ GIT_EDITOR='perl -i -n -f /tmp/strip-seen-lines.pl ' \
GIT_SEQUENCE_EDITOR='cat /tmp/git-rebase-todo-list >' \
git rebase --keep-empty -i $(git rev-list --max-parents=0 HEAD)
Despite using --keep-empty
, git
complained a few times through this process about empty commits. It would dump me out to the console with an incomplete git rebase
. To skip the empty commit and resume processing, the following two commands were needed (rather frequently in my case).
$ git reset HEAD^
$ GIT_EDITOR='perl -i -n -f /tmp/strip-seen-lines.pl ' git rebase --continue
Again despite --keep-empty
, I found I had no empty commits in the final git history, so the resets above had removed them all. I assume something is wrong with my git, version 2.14.1 . Processing ~10000 commits like this took just over 10 minutes on a crappy laptop.