Establishing a "clean" git history for explicit merges

Question

I have a git repository that uses GitFlow (i.e., it has master, develop, release-*, and feature-* branches). The collaborators have not been using explicit merges (i.e., git merge --no-ff), however, and so e.g. git log --first-parent does not provide a simple roll-up of the merge history to date.

Moving forward, the collaborators will be using explicit merges. Before they do, however, I'd like to make sure that the history is "clean", so that no prior history is displayed when calling git log --first-parent. But, obviously, I want to maintain the actual commit history when calling an unfiltered git log.

My inclination is to do the following:

$ git checkout develop
$ git checkout --orphan CleanSlate
$ git rm . -r -f
$ git commit --allow-empty -m "Establish a clean slate for the develop branch"
$ git merge --no-ff --allow-unrelated-histories develop -m "Introduce all legacy files"
$ git checkout develop
$ git merge CleanSlate

Basically, the idea is that we'll:

Establish a fresh (--orphan) branch with no prior history
(Optional) Remove all files from the working tree so that we're not recommitting them
Establish an initial commit so that we have something to merge into
Perform an explicit merge (i.e., --no-ff) from the develop branch, acknowledging the unrelated histories
Fast forward develop to the explicit merge we just performed so that represents the history

My Question(s): Are there consequences to this approach that I should be aware of before applying it to a production environment? Are there alternative or simpler approaches that are preferable for accomplishing this type of scenario?

(In testing, this seems to achieve my objective with no adverse impact on existing branches or workflow. But, with git, I'm always wary of what I don't know I don't know.)

score 3 · Answer 1 · answered Dec 28 '19 at 01:26

I think I undertstand the idea here.

Let's draw what actually happens, step by step. For the purpose of the initial drawing, let's say that branch develop ends at ordinary commit D:

...--B--C--D   <-- develop

The first command seems not really relevant; the second gets us onto an unborn ("orphan") branch, and the third empties the index and work-tree:

$ git checkout develop
$ git checkout --orphan CleanSlate
$ git rm . -r -f

so that the fourth command creates an empty commit E with no parent:

$ git commit --allow-empty -m "Establish a clean slate for the develop branch"

which gives us this graph:

          E   <-- CleanSlate (HEAD)

...--B--C--D   <-- develop

Now:

$ git merge --no-ff --allow-unrelated-histories develop -m "Introduce all legacy files"

The merge command makes a new merge commit; logically F is the next letter but I've fallen to temptation and called it M here:

          E--M   <-- CleanSlate (HEAD)
            /
...--B--C--D   <-- develop

Importantly, the first parent of M is the empty commit E. The second parent of M is commit D. So a future git log --first-parent that walks back to M will reach E and then stop.

The last two commands attach HEAD to develop and move develop to point to M:

$ git checkout develop
$ git merge CleanSlate

giving:

          E--M   <-- develop (HEAD), CleanSlate
            /
...--B--C--D

(You can now delete the name CleanSlate safely.)

There is a shorter set of commands to do this

Consider this recipe (untested, but I eyeballed it again before posting and it looks right):

et=$(git hash-object -t tree /dev/null)
e=$(git commit-tree -m "dummy empty commit at which --first-parent stops" $et)
m=$(git commit-tree -p $e -p develop -m "begin strict no-ff merges" develop^{tree})
git checkout -B develop $m

Using the two -p (parent of commit) arguments, we choose the parent hashes for merge commit M, in the order we like: the first -p is the parent traced by git log --first-parent and the second -p is the second parent that makes M a merge commit.

The actual trees (or snapshots) stored in the two new commits are that in $et (the empty tree) and develop^{tree} (the snapshot for commit D) respectively. You can now easily choose to make commit E share the tree from D if you prefer.

The final git checkout -B develop makes Git switch to commit M and point the name develop to it. The fact that this is a fast-forward merge means you could use:

git checkout develop; git merge --ff-only $m

but this does it with one more-obscure command. Note: since commits E and M have no name protecting them until you have moved develop like this, you must complete the last step within 14 days of starting the process, to make sure that Git's garbage collector does not remove them.

The result is the same either way. Most of Git is about the commits, and the graph they form. Most of the rest is about using names (branch and/or tag names and/or other names) to get started when walking the graph.

The more I look into `commit-tree`, the more I find it to be a fantastic tool. — Romain Valeri, Dec 28 '19 at 06:21

Establishing a "clean" git history for explicit merges

1 Answers1

There is a shorter set of commands to do this