Git Squash by author - All author commits into a single commit

Question

I am trying squash many commits into a single one, the problem is that I need do that by author (name or email).

The case:

Lets say I have a branch called feature-a, in this branch I have many commits for many authors. How can I squash all commits by author (email for example) into a single commit. I want do that to be able to merge all author commits into master.

Any help here?

Thanks in advance

That sounds as if it could lead to a lot of unnecessary effort. That is, if commits by a particular author are not consecutive, then squashing them is going to require a lot of manual intervention. You would be better off having your many authors each work on their own branch. — larsks, Jun 20 '15 at 00:57

John Vandenberg · Answer 1 · 2017-09-27T05:12:03.433

I needed to do a similar rewrite on an unnecessarily large repository while the repo was offline. The approach I took was trying automated 'interactive' rebase using GIT_SEQUENCE_EDITOR which is covered in this answer by @james-foucar & @pfalcon.

For this to work well, I found it better to first remove the merges from the section of the history being rewritten. For my own case, this was done using lots of git rebase --onto which is covered amply in other questions on StackOverflow.

I created a small script generate-similiar-commit-squashes.sh to generate the pick & squash commands so that consecutive similar commits would be squashed. I used author-date-and-shortlog to match similar commits, but you only need author (my gist has a comment about how to make it match only on author).

$ generate-similiar-commit-squashes.sh > /tmp/git-rebase-todo-list

The output looks like

...
pick aaff1c556004539a54a7a33ce2fb859af0c4238c foo@example.com-2015-01-01-Update-head.html
squash aa190ea2323ece42f1cd212041bf61b94d751d5c foo@example.com-2015-01-01-Update-head.html
pick aab8c98981a8d824d2bc0d5278d59bc1a22cc7b0 foo2@example.com-2015-01-28-Update-_config.yml

The repository was also full of self-reverts with the same style 'Update xyz' commit messages. When squashed, they resulted in empty commits.

The commits I was merging had identical commit messages. git rebase -i offers a revised commit message with all squashed commit messages appended, which would have been repetitive. To address that, I used a small perl script from this answer to remove duplicate lines from the commit message offered by git rebase. It is better in a file, as it will be used in a shell variable.

$ echo 'print if ! $x{$_}++' > /tmp/strip-seen-lines.pl

Now for the final step:

$ GIT_EDITOR='perl -i -n -f /tmp/strip-seen-lines.pl ' \
  GIT_SEQUENCE_EDITOR='cat /tmp/git-rebase-todo-list >' \
  git rebase --keep-empty -i $(git rev-list --max-parents=0 HEAD)

Despite using --keep-empty, git complained a few times through this process about empty commits. It would dump me out to the console with an incomplete git rebase. To skip the empty commit and resume processing, the following two commands were needed (rather frequently in my case).

$ git reset HEAD^
$ GIT_EDITOR='perl -i -n -f /tmp/strip-seen-lines.pl ' git rebase --continue

Again despite --keep-empty, I found I had no empty commits in the final git history, so the resets above had removed them all. I assume something is wrong with my git, version 2.14.1 . Processing ~10000 commits like this took just over 10 minutes on a crappy laptop.

Interesting approach, more complete than my answer. +1. Not sure about your `--keep-empty` problem though. — VonC, Sep 25 '17 at 11:19
Also worth noting any time rewriting the history, if you do not want the rewriting process to attribute yourself (the rewriter) as the committer for every commit, you'll probably need to finish the rewrite by resetting the committer. The simplest method is [setting committer=author](https://stackoverflow.com/a/32944640/5037965), but a more elaborate approach will be needed if the original committer information was actually important and needs to be retained. — John Vandenberg, Sep 27 '17 at 05:07
I've added to https://gist.github.com/jayvdb/9b41677f00065dbd94cc02446fc5ba34 a script `generate-multiple-new-file-squashes.sh` to merge commits consisting only of consecutive adds (i.e. the committer used the GitHub/GitLab/BitBucket Web UI "upload file") — John Vandenberg, Sep 27 '17 at 05:10
use `git rebase -i --root` if you need to access the first commit — syedelec, Feb 13 '19 at 01:25

score 2 · Answer 2 · answered Jun 20 '15 at 01:05

Be careful rewriting history

The end result you want might be possible if you create branches for each author, cherry-pick the commits from each author into the right branch, then squash those changes. However, I don't think that will work if these commits meaningfully depend on each other.

If you have a series of commits:

            Author1                Author2                Author1
version1 ---commit---> version2 ---commit---> version3 ---commit--->...

If you were to try to extract the changes from Author2, and apply them to version1, there's a good chance it won't make sense (For example, if Author2 modifies code that Author1 created).

score 1 · Answer 3 · edited Jun 20 '20 at 09:12

1

With Kenkron's caveats in mind, you could do a:

SORTED_GIT_LOGS=$(git log --pretty="format:%an %H" master..feature_a | sort -g | cut -d' ' -f2); \
IFS=$(echo -en "\n\b"); for LOG in $SORTED_GIT_LOGS; do \
    git cherry-pick $LOG; \
done | less

The git log --pretty="format:%an %H" master..feature_a | sort -g would sort the logs of the feature_a commits (not the ones from master because of the master..feature_a syntax)

You would still need to do an interactive rebase to squash the (now ordered by author) commits on master.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jun 20 '15 at 04:56

VonC

1,262,500
529
4,410
5,250

Example of interactive rebase: http://denniskubes.com/2012/08/22/honey-i-squashed-the-commits/ – VonC Jun 20 '15 at 05:02
If done by author, a `git reset` could work too: http://makandracards.com/makandra/527-squash-several-git-commits-into-a-single-commit – VonC Jun 20 '15 at 05:02

Git Squash by author - All author commits into a single commit

3 Answers3

Linked