How to automagically remove commits that cancel themselves out?

Question

I have a development branch with many commits. These commits include "real" changes (e.g. add a feature) as well as temporary changes (e.g. add test code in one commit, then remove it in a later commit).

The real changes from this branch are being gradually added into the master. After each addition, I rebase the develoment branch on the new master. With each cycle, difference between the development branch and the master is getting smaller. However, the rebased development branch contains all original (now rebased) commits. Some of these commits now have zero net effect.

Simple example:

DEV_BRANCH
    Commit ...
    Commit D: Remove all test code from f.cpp
    Commit ...
    Commit C: Add new feature to f.cpp
    Commit ...
    Commit B: Add more test code to f.cpp
    Commit A: Add some test code to f.cpp
    Commit ...
MASTER
    Commit X: Add new feature to f.cpp
    Commit ...

At this point, change to f.cpp from commit C is already in the master as commit X, and commits A+B+D when combined have no change to f.cpp. In other words, diff between the branch and the master shows nothing for file f.cpp.

(In reality, commits A, B, C, and D may include changes to other files as well.)

Is there any way to automagically simplify commits in the development branch either during rebase or otherwise?

In the above simple example, is it possible to remove commit C (with change already merged to master hence now "empty") and also commits A, B, and D (no change when combined) automatically?

In a more complicated scenario when the commits to f.cpp modify other files as well, is it possible to remove changes to file f.cpp automatically from the commits in the development branch (to simplify them) but leave the commits present if they include changes to other files?

Rephrased, if applying all changes to a file from all commits in the development branch results in a file identical to the one in the master, is it possible to prune changes to this file from the development branch commits? (I do realize this may lead to side-effects if changes to other files need to be done in sync with those to the file that has no net effect. However, this is not a problem in my scenario because I do not need to cherry-pick any intermediate state from the development branch.)

@quetzalcoatl Yes, I do realize that in general this is a difficult problem. However, my situation seems constrained, I do not need any intermediate state in the dev branch per se. I considered to create a new branch from the master and copy all changed files from my original dev branch over. However, this creates one new lump commit and looses all granularity (which is useful to have until I am done picking all real changes,) — Petr Vepřek, Jul 06 '16 at 09:47
now thinking about that..after commit-feature-onto-master and rebase-devel-on-master you can detect which files have actual changes on devel just by making a `merge --squash --no-commit --no-ff` between devel and master. That will give the list of files that have effectiv changes,call it **chg**.Now abort merge, and walk devel history from HEAD downto master with `filter-branch`(with removeemptycommits) and edit each commit and remove any files that **are not in chg**.That would leave commits with only proper changes but also false positives:files that both reverted-changes and proper-changes. — quetzalcoatl, Jul 06 '16 at 09:59
that part does some cleaning, is easy, is not exponential-complexity/etc, and is most probably implementable as a bash script.. but solving the false-positives seems just as hard as the original problem.. so that's helpful in optimistic case, but not a real solution to a general case — quetzalcoatl, Jul 06 '16 at 10:02
This may be a direction: use `git commit` with flags: `--fixup=` or `--squash=` and then during `rebase` use: `--autosquash` you can read more here: * https://robots.thoughtbot.com/autosquashing-git-commits * https://coderwall.com/p/hh-4ea/git-rebase-autosquash * http://fle.github.io/git-tip-keep-your-branch-clean-with-fixup-and-autosquash.html — Chananel P, Jul 06 '16 at 19:45
Not sure about automagic, but next time you do a rebase, use the `-i` flag and just remove the commits you don't want in the history. — Mad Physicist, Jul 06 '16 at 20:49

score 2 · Answer 1 · edited May 23 '17 at 11:51

I think that there are a couple of fairly simple solutions for your particular use-case that take into account how the dupes are generated in the first place.

Solution 1: For a small number of commits on diverging branches

Let's say you have the following set of commits in dev shown in git rebase --interactive mode:

pick bf45b13 Add feature X
pick b1f790f Cleanup feature X
pick 40b299a Add feature Y

Now you decide to cherry-pick 40b299a into master.

The problem is that the new commit when you rebase dev will be empty. I would recommend doing git rebase -i master for future rebases. You can then delete the lines that you cherry picked from the selection, and be left only with the commits that you want.

Since you want to do this "automagically", you can make your own git script to do a simultaneous cherry-pick and rebase, eliminating the correct commit from dev while adding it to master.

If you call the following script something like git-transfer, and add it to your PATH somewhere, you can invoke it as git transfer dev master commit [...]:

#!/bin/bash

usage() {
    echo "Usage: git transfer from-branch to-branch commits [...]"
    if [ -n "${1}" ]
    then
        echo
        for line; do echo "${line}"; done
    fi
    exit 1
}

FROM="$(git rev-parse --abbrev-ref "${1}")"
[ -z "${FROM}" ] && usage 'from-branch must be a valid branch name' || shift
TO="$(git rev-parse --abbrev-ref "${1}")"
[ -z "${TO}" ] && usage 'to-branch must be a valid branch name' || shift

ORIGINAL="$(git rev-parse --abbrev-ref HEAD)"

if [ "${ORIGINAL}" == "${TO}" ]
then
    echo "Already on branch ${TO}"
else
    echo "Switching from ${ORIGINAL} to ${TO}"
    git checkout "${TO}" || exit
fi

while [ $# -gt 0 ]
do
    for COMMIT in "$(git rev-parse "${1}" | grep '^[^^]')"
    do
        echo "Moving ${COMMIT} to ${TO}"
        git cherry-pick "${COMMIT}"
        echo "Removing ${COMMIT} from ${FROM}"
        EDITOR="sed -i '/^pick $(git rev-parse --short ${COMMIT})/ d'" git rebase -i "${TO}" "${FROM}"
    done
    shift
done

if [ "${ORIGINAL}" != "${TO}" ]
then
    echo "Switching back to ${ORIGINAL} from ${TO}"
    git checkout "${ORIGINAL}"
fi

The script accepts the name of the "from" branch that will be rebased, the name of the "to" branch that will be cherry-picked into, and a list of commits, commit ranges, etc. It is based heavily on the ideas proposed in this question and answer (Also this answer for the loop).

This solution is suitable for small numbers of commits (one at a time most likely) as well as for branches that have diverged significantly.

Solution 2: For larger numbers of commits on a single timeline

In the use case you described, you periodically rebase dev onto master, so dev is effectively ahead of master all the time. Next time, instead of picking individual commits onto master, do the following:

Start an interactive rebase
Move the commits you want to the beginning of the list
Complete rebase
Move master to the last commit you want.

This approach will save you from having any duplication at all and is probably the simplest thing you can do. Note that while dev will constantly be rebased, master will only fast forward with this approach.

I like your idea of reordering commits. Unfortunately, because of many intermingled commits in my branch this does not work for me. — Petr Vepřek, Jul 08 '16 at 09:08
@PetrVepřek. I am not sure I understand why though. You are basically selecting some of those commits anyway. Does it matter wether you do the rebase as part of the selection or not? — Mad Physicist, Jul 08 '16 at 12:37
@PetrVepřek: `because of many intermingled commits in my branch` - you know that during a rebase you can not only reorder and merge (fixup,squash,etc) commits but also edit and split them, right? splitting commits doesn't mean just one-file-here one-file-there, you can also select specific chunks/lines or even inject new text or commits in between. That's sometimes a significant amount of (manual) work, but it often helps cleaning up and/or reordering old "uncareful" commits. — quetzalcoatl, Jul 09 '16 at 19:03
@quetzalcoatl: Good point. For me, this option becomes too tedious and not worth it. — Petr Vepřek, Jul 10 '16 at 11:45

score 1 · Answer 2 · answered Jul 06 '16 at 20:47

I see people have, in the comments, addressed the complexity issues.

It's not possible in general since the problem is too hard. If you define the problem down to "empty" (no changes from immediately preceding) commits, it's easy, and is in fact already in Git: rebase already does it unless you ask for --keep-empty.

(For that matter, git filter-branch, which is sort of like rebase on steroids, can also do it, although its default is the opposite: it keeps such commits unless given --prune-empty. If you are using a commit filter, you may use git_commit_non_empty_tree, which simply compares the current tree to the previous before actually invoking git commit-tree. In this case, git filter-branch is probably heavier-weight than you would want, though.)

In a comment, you wrote:

I considered to create a new branch from the master and copy all changed files from my original dev branch over. However, this creates one new lump commit and looses all granularity (which is useful to have until I am done picking all real changes.)

Git is designed for this: just "detach" HEAD (git checkout --detach or check out by raw SHA-1 ID) and make whatever temporary commits you like. Once you switch back to a named branch, the temporary commits will remain only as long as they are protected by the HEAD reflog (default 30 days for unreachable-from-tip commits). (Well, if they're loose objects, as they will be, they also get a 14-day grace period, but 14 is less than 30.)

You don't even have to make a commit though: just git diff (or git diff-tree) the tree at the tip of master vs the tree at the tip of dev_branch. In other words, both forms of the complete source tree are already in Git, as two individual commits. They have whatever history (of other commits) in between them:

          o-------o    <-- master
         /
...--o--o
         \
          o--o--o--o   <-- dev_branch

but that cannot stop you from comparing them directly. The detached HEAD trick is useful for doing this:

          o-------o    <-- master
         /
...--o--o------*       <-- detached HEAD
         \
          o--o--o--o   <-- dev_branch

where * is some proposed commit (actually committed!), which you can then discard if you don't like it by using git reset or git checkout to move HEAD back to some other commit, or back to pointing to a branch name.

(In fact, this is how git rebase does its work: it detaches HEAD at the point on which the new branch will grow, adds commits until the new branch is done, then erases the arrow going from the branch name to its original tip-most commit and makes the arrow point to the new tip-most commit.)

Yes, I think the truly empty commits are (by default) not kept during rebase. Just in case (also inspired by http://stackoverflow.com/a/5324916/2325279) I ran a commit filter. It took a long time but did not simplify anything. — Petr Vepřek, Jul 08 '16 at 09:04

How to automagically remove commits that cancel themselves out?

2 Answers2

Linked