git: why would a merge commit be required for replacements?

Question

Our git repo was imported from a different VCS (Perforce) but the standard import was unable to recover branch relations (i.e. that branch 1.1 was derived from branch 1.0). To fix this we added grafts: git replace --graft <commit> <parent> to record the missing relation from the creation of each branch to its parent commit.

So we have a git repository that has some replacements in .git/refs/replace. (These should probably have been removed by rebasing somehow but perhaps it is too late for that now)

These replacements must be pulled manually using:

git pull origin 'refs/replace/*:refs/replace/*'

A colleague was unaware of this but was somehow able to push a change to the repo such that:

git pull origin 'refs/replace/*:refs/replace/*'

results in a merge conflicts like:

Simple merge did not work, trying automatic merge.
ERROR: /some/file: Not handling case c6aa12b3f446c57921a68c5fc73dae9e086c2bdb ->  -> bb909246180daf894ccdb59cc4a4ff398ac62bad
fatal: merge program failed
Automated merge did not work.
Should not be doing an octopus.
Merge with strategy octopus failed

I don't understand what is being merged here. If I use:

git pull origin 'refs/replace/*:refs/replace/*' -s ours

The merge succeeds and I end up (aftering commiting) with a merge message like:

commit 67d7f7cc40826e8a84beed3af9999d39e411e65d
Merge: 718dbe3 c9333d1 b870b22 d3f10f7 77d0835 de79e03 d7f0e7c 97ca1f8 Author: xxxxx
Date:   Tue Jul 11 11:33:55 2017 +0100
Merge commits 'refs/replace/00029d7b3e531215f6ce5afb32862b49d652e896', 'refs/replace/03d715c9890e5cec95ac62d1c9ecc54cb78b9f62', '

git claims several files are changed, though these are not files changed recently. I suspect they are the difference between two old branches.

After the merge/pull we have replacements in .git/refs/replace.

My colleague's report claims the changes pushed include grafts replicating 2 of the missing branch relations which he needed. These do not appear in .git/refs/replace at all.

If they are somehow mixed in with the ones pushed earlier there are some other questions:

How was he able to push without resolving the merge conflict first?

Also if you clone from the repository without pulling the replacements the relations he added are still pulled. There are two tell-tale commits with the commit messages like

Former-commit-id: bc735afc1d8bb842733cb94767afb8b42599eb6a

but there is no .git/refs/replace directory describing the replacement.

How could my colleague have been able to push his grafts as permanent changes to the repo that are pulled automatically when the earlier are not? and such that there is no .git/refs/replace directory at all?

Can someone enlighten me as to what might be happening here?

Also is there something I can do to make the other branch relations permanent without rewriting history? My colleague appears to have done this but I can't understand how.

Resolved

To summarise the answers:

In git the opposite of a push is a fetch not a pull

My colleagues push wouldn't have had a merge issue as there wasn't one. Its was artifact of my using 'pull' instead of 'fetch'.

Thanks for the help.

score 1 · Answer 1 · answered Jul 11 '17 at 14:48

I think 99% of the confusion is coming from the fact that you're using pull, instead of fetch, to replicate the replacement reps. I expect this is trying to octopus merge all of the replacement refs into your current branch, which is of course rather nonsensical.

The way to make permanent changes to the lineage of commits is to do a history rewrite, and in the long run you should probably go ahead and do it rather than rely so heavily on the replacement mechanism. Obviously that would require coordination of the team, but it's a one-time cost and then the repo will work much more seamlessly from then on.

You were correct but I didn't understand why without help from Torek's answer. — Bruce Adams, Jul 11 '17 at 17:07

score 1 · Accepted Answer · answered Jul 11 '17 at 15:15

The answer to the title question ("why would a merge commit be required for replacements?") is: It wouldn't.

Here is an important rule for using Git: never use git pull. :-)

(Once you're well-versed in Git, you can relax this rule, to use git pull only when you know exactly what's about to happen.)

In this particular case, you should run:

git fetch origin 'refs/replace/*:refs/replace/*'

and stop at that point.

If you plan to keep using replacements, you may wish to add this to the set of fetch = lines for each remote. (Note that once you start using replacements, you tend to get stuck with them, so this may be a reasonable thing to do. The other option, as in Mark Adelsberger's answer, is to wire the replacements in, e.g., using a no-op git filter-branch.) This "add a fetch setting" must be done manually after each git clone, since a normal clone does not fetch replacement name-space names. (This also ties into your mirror clone question from earlier: a mirror clone slavishly fetches all refs/* references, which includes the replacement name-space.)

Description

The git pull command is meant as a convenience short-cut. It first runs git fetch, which is the actual operation to obtain commits and other items from other repositories. Then it runs another Git command.

In most cases, after obtaining items (such as commits) from another Git repository, you must take some action to use those items yourself. Obtaining them just plops them into your repository, giving them some sort of reference-name—usually a remote-tracking branch name, such as refs/remotes/origin/somebranch.

The action to take is usually git rebase, and sometimes git merge. The git pull command will run git merge for you, i.e., take the wrong action, unless you tell it to run git rebase for you instead. That may, of course, still be the wrong action—and in this case, it is.

The references you are bringing over are in the refs/replace/ name-space, not the refs/heads/ name-space (which holds branch names) and not the refs/tags/ name-space (which holds tag names). The refs/replace/ name-space holds replacement items.

Replacements and tags share a curious feature, as compared to the normal use of branch names: no action is necessary to use them. With branch names, when you obtain refs/heads/master from Monique's repository, you rename it to refs/remotes/monique/master, so that you can keep your master—a branch, in refs/heads/—separate from her master, which you keep only as a remote-tracking branch. You will then take some section action—merge or rebase—to incorporate her work into yours.

With tags names, however, you might take Monique's refs/tags/v2.3 and call it your refs/tags/v2.3. Now you both share the tag v2.3, and there is no second action required: your Git will look up the tag name v2.3 in your own refs/tags/ name-space for you.

The same holds with replacements. A replacement object, in Git, is represented by a refs/replace/ name-space name with a very peculiar pattern: instead of refs/replace/master or some such, we find names like refs/replace/b06d3643105c8758ed019125a4399cb7efdcce2c. That name maps to the replacement object itself. That is, that big long hairy name maps to another big long ugly Git hash ID, which Git can use to access a different Git object.

For memorability, I'll use blah instead of b06d3643105c8758ed019125a4399cb7efdcce2c below. So refs/replace/blah maps to another ID, which we might call bazinga.

When your Git is about to do something with an internal object whose hash ID is blah, it notices that there's a refs/replace/blah. Instead of using the normal blah object, then, it looks up the bazinga object, and uses that one instead. (With --no-replace-objects Git skips this "check for refs/replace/" step.)

That's how replacements work, and as a consequence, when you use them, you should just fetch them and stop.

If you view your .git/config file, you will see some lines of the form:

[remote "origin"]
    url = ...
    fetch = +refs/heads/*:refs/remotes/origin/*

Adding a line (keeping the original fetch = in place) of the form:

    fetch = +refs/replace/*:refs/replace/*

will make each git fetch origin pick up any new replacement objects automatically. For more about this, see my answer to What is the difference between these `git fetch` syntaxes?

As you say, if I use git fetch instead of git pull "it just works". A subsequent git merge says "already up-to-date". So what is (or isn't) git pull doing before it calls merge? Is it some kind of race because the replacements have not yet been applied? My naive expectation was that if nothing needs to be done, it will do nothing. Obviously git pull thinks something needs to be done. — Bruce Adams, Jul 11 '17 at 16:30
It's not a race at all: it's because `git pull` specifically merges with whatever hash IDs you specifically pulled, even if it doesn't make sense to do so. A separate `git merge` step does something different: it looks at the current branch's *upstream* setting, and uses that as the hash ID for the merge. (This has to do with the history of Git, where `git pull` actually predates the idea of remote-tracking branches in the first place. This turned out to be a mistake—or at least insufficient experience—but `git pull` still behaves in a backward-compatible way.) — torek, Jul 11 '17 at 16:47

git: why would a merge commit be required for replacements?

Resolved

2 Answers2

Description