2

I am attempting to clean up the git history in a repo I'm working on. Given a git history that looks like this:

        . -- .
       /       \
... - S   ...   T - ... H
       \       /
        . -- . 

That is:

  • There is some arbitrary DAG behind point S.
  • There is some arbitrary DAG after point T.
  • There is some arbitrary DAG between point S and T.
  • The graph before S is disjoint from the graph after T (i.e. removing T or S will disconnect the removed nodes predecessors from H).

I would like to rewrite the history between point S and T (e.g. squash or linearize), such that there is some new git history that ends with point T'.


... - S -- ... -- T'

The critical constraint is that the contents of the repo at point T and T' are exactly the same, even though the git commit is different and the way we got from S to T' might have changed.

This much I can do. What I would like to do after this (and i haven't had luck doing so yet) is to transplant the exact structure of the DAG inclusively between T and H to get:


... - S -- ... -- T' - ... H'

Of course the commit hashes will change, but what's important is that the graph structure, authors, and other meta data between T' and H' is the same.

I would have though I could do this with a cherry-pick:

git cherry-pick T^..H

but this seems to result in merge conflicts. I was looking for answers in this SO post: How to cherry-pick a range of commits and merge them into another branch? but either I'm invoking rebase --onto incorrectly, or these answers aren't a solution to my question.

To make this more concrete I have a MWE. Consider the following code:

The following code constructs this example:

    mkdir -p "$HOME/tmp/tmprepo"
    rm -rf   "$HOME/tmp/tmprepo"
    mkdir -p "$HOME/tmp/tmprepo"


    cd "$HOME"/tmp/tmprepo
    git init 

    git checkout -b main
    echo "state01" > state && git add state && git commit -m "Initial commit"
    echo "state02" > state && git add state && git commit -m "Modify state"

    git checkout -b branch1
    echo "state03" > state && git add state && git commit -m "Modify state"
    echo "state04" > state && git add state && git commit -m "Modify state"
    echo "state05" > state && git add state && git commit -m "Modify state"

    git checkout main
    git checkout -b branch2
    echo "state06" > state && git add state && git commit -m "Modify state"
    echo "state07" > state && git add state && git commit -m "Modify state"
    echo "state08" > state && git add state && git commit -m "Modify state"

    git checkout main
    git merge branch2 --no-ff -m "merge commit" 
    git merge branch1 -s ours --commit --no-edit --no-ff -m "merge commit" 

    git checkout -b branch3
    echo "state09" > state && git add state && git commit -m "Modify state - WANT TO SQUASH"
    git tag "Point1"
    echo "state10" > state && git add state && git commit -m "Modify state - WANT TO SQUASH"
    git checkout main
    git merge --no-ff -m "merge commit - WANT TO SQUASH" branch3
    git tag "Point2"

    git checkout main
    git checkout -b branch4
    echo "state11" > state && git add state && git commit -m "Modify state"
    echo "state12" > state && git add state && git commit -m "Modify state"
    echo "state13" > state && git add state && git commit -m "Modify state"

    git checkout main
    git checkout -b branch5
    echo "state14" > state && git add state && git commit -m "Modify state"
    echo "state15" > state && git add state && git commit -m "Modify state"
    echo "state16" > state && git add state && git commit -m "Modify state"

    git checkout main
    git checkout -b branch6
    echo "state17" > state && git add state && git commit -m "Modify state"
    echo "state18" > state && git add state && git commit -m "Modify state"
    echo "state19" > state && git add state && git commit -m "Modify state"

    git checkout branch5
    git merge branch6 -s ours --commit --no-edit --no-ff -m "merge commit" 

    git checkout main
    git merge branch5 -s ours --commit --no-edit --no-ff -m "merge commit" 
    git merge branch4 -s ours --commit --no-edit --no-ff -m "merge commit" 

This creates this git history:

enter image description here

As an example I want to squash the commits between Point1 and Point2, and then apply the rest of the history after Point2.

I can do the squash like this:

    # Squash all information between point1 and point2
    git checkout Point1
    git reset --hard Point2
    git reset --soft Point1^
    git commit -am "all changes between point1 and point2"
    git tag "Point2_prime"

which gives us this:

enter image description here

But I can't figure out how to get the rest of the history on top of it. This is what I've tried so far:

    # The state is now guarenteed to be the same as Point2, but the history has
    # been modified to our liking. Now we need to replay all the other commits
    # on top of this.

    # Based on answers in this SO post:
    # https://stackoverflow.com/questions/1994463/how-to-cherry-pick-a-range-of-commits-and-merge-them-into-another-branch

    COMMIT_A=$(git rev-list -n 1 Point2)
    COMMIT_B=$(git rev-list -n 1 main)
    echo "COMMIT_A = $COMMIT_A"
    echo "COMMIT_B = $COMMIT_B"

    # I've tried the following, but they do not seem to work.

    # Try with cherry pick
    git cherry-pick "${COMMIT_A}..${COMMIT_B}" 

    # Try with rebase onto
    git rebase "$COMMIT_A" "$COMMIT_B"~0 --onto HEAD

I would think because the state of the new commit is exactly the same as the state at Point2, there would be a way to do this non-interactively without merge errors. Is this possible?

Erotemic
  • 4,806
  • 4
  • 39
  • 80

2 Answers2

4

The simplest way is to use the git replace + git filter-repo trick, documented here :

Parent rewriting

To replace $commit_A with $commit_B (e.g. make all commits which had $commit_A as a parent instead have $commit_B for that parent), and rewrite history to make it permanent:

git replace $commit_A $commit_B
git filter-repo --force

In your case :

git replace Point2 <sha of "all changes between point1 and point2">
git filter-repo --force

If you don't have git-filter-repo installed, the older git filter-branch command also "persists" replacement objects :

# run a phony filter-branch command, you just want to have the
# "rewrite replaced commits" effect:
git filter-branch --tag-name-filter cat main

# you can instruct filter-branch to ignore commits before Point1:
git filter-branch --tag-name-filter cat ^Point1 main

# to have git-filter-repo try to rewrite all branches :
git filter-branch --tag-name-filter cat -f ^Point1 --branches

[edit] you can speed up the "create Point2_prime commit" process :

# the following command creates a 'Point2_prime' commit, e.g:
#  * has the same content as 'Point2'
#  * has 'Point1' as a parent
# and creates the replacement rule 'Point2' -> 'that commit'
git replace --graft Point2 Point1

# you can now run :
git filter-repo --force

from the doc : git help replace

--graft <commit> [<parent>…​]

Create a graft commit. A new commit is created with the same content as <commit> except that its parents will be [<parent>…​] instead of <commit>'s parents. A replacement ref is then created to replace <commit> with the newly created commit.

LeGEC
  • 46,477
  • 5
  • 57
  • 104
  • Wow, that seems to work! I updated my post to add the tag Point2_prime to make the command easier to express: `git replace Point2 Point2_prime`. To cement my understanding, the filter-repo is necessary because "replace" doesn't change the commit-sha-id of Point2 and it's descendants? So, any remote wouldn't see a changed head? (If so, I didn't realize you can modify a commit without modifying it's sha-id) – Erotemic Jun 07 '22 at 20:30
  • @Erotemic Right. "replace" is kind of a suggestion that needs to be coordinated: "Hey everyone, let's all do this, please." (i.e. Let's all fetch the `refs/replace`.) `filter-repo` locks it in and says, "Hey everyone, go get this new repo. Or, go reset your branch(es) appropriately." The "replace" could optionally be ignored by other users, but the new re-written repo cannot be. – TTT Jun 07 '22 at 21:33
  • Git uses the commit timestamp when it walks parallel histories. Therefore, a warning: after this procedure, Git's commit walking abilities are at risk should you ever find the need to make a branch before `Point2_prime` and merge it into a commit after `Point2_prime`, because the commit time stamp on `Point2_prime` is younger than its successor commit. Such inversions tend to produce surprising and possibly wrong results. But as long as all commit walks cannot bypass `Point2_prime`, you are safe. – j6t Jun 07 '22 at 21:50
  • @j6t Good to know. My use case is that I want to do a lot of these operations to clean up long chains of commits, and handling it in smaller chunks makes it more manageable. If I do a final filter-repo at the end such that everything is rewritten, there shouldn't be any issue (assuming there are no references to the old tree), correct? – Erotemic Jun 08 '22 at 13:47
  • My assumption (because I did not check) is that `filter-repo` does *not* change commit timestamps of the commits that it has to rewrite. If that were the case, then there would not be an issue in the first place. – j6t Jun 08 '22 at 14:07
0

You could use the option to the rebase command called --rebase-merges. This will (attempt to) preserve the graph by recreating the merge commits. Note the "attempt" cannot automatically resolve conflicts, as stated in the documentation:

Any resolved merge conflicts or manual amendments in these merge commits will have to be resolved/re-applied manually.

So, once you've created the squashed T', you can simply run this command:

git rebase T H --onto T' --rebase-merges

If you didn't have merge conflicts in the original structure, this should work without issues. However, if you have more than just a few merge conflicts to resolve, then you'll probably be far better off using git-filter-repo as described in LeGEC's answer.

TTT
  • 22,611
  • 8
  • 63
  • 69
  • This does not work. If you run the above MWE and attempt `git rebase Point2 main --onto HEAD --rebase-merges` it results in Auto-merging state CONFLICT (content): Merge conflict in state Could not apply 480640b... merge-commit # merge commit – Erotemic Jun 07 '22 at 20:13
  • Is there any modification to this command that would make it work without merge conflicts? Something like this seems like a cleaner solution than using `git replace`. Why is it generating the merge conflicts in the first place? My intuition is that this would have worked, but my observations contradict that. – Erotemic Jun 07 '22 at 20:33
  • 1
    @Erotemic I'm not sure why there are conflicts. As soon as I wrote the word "guaranteed" I knew I was going to regret it. :D – TTT Jun 07 '22 at 20:50
  • 1
    @Erotemic ah.. it's because of the `-s ours`. Rebase merges option doesn't know how to resolve conflicts... – TTT Jun 07 '22 at 21:02
  • Hmm, I'm glad I put that in the MWE. That's important because in my real use case there will be lots of different merge strategies in the arbitrary DAG after `T`. Its strange that git wouldn't "know" how to resolve them given it could see the future history. But until that is implemented, I think rebase just wont work here. I suppose I'll try to massage the `replace` option to get what I need. My only issue with that so far is that it blows away the original commits. – Erotemic Jun 07 '22 at 21:08
  • 1
    @Erotemic regarding blowing away the original commits, the `filter-repo` option `--partial` may be of interest to you. – TTT Jun 07 '22 at 22:03