How exactly does git pull --allow-unrelated-histories work?

Question

Right, so I've searched through some other SO threads, and also checked out this: https://git-scm.com/docs/git-merge

I understand that --allow-unrelated-histories allows two projects to join together, however, what I don't understand is how exactly it works.

Does it just work like this? https://i.stack.imgur.com/ikscx.jpg

The git site above shows this diagram:

      A---B---C topic
     /         \
D---E---F---G---H master

However, to me that makes it look like they don't have unrelated histories, because the topic branch split off from 'E'. Even if the topic branch split off from master at 'D', they would still share the 'D' branch.

Would anyone be able to explain (preferably with visuals) how exactly allow-unrelated-histories works? I am trying to git pull, but one of my team members edited the branch I am pulling from and now I have to use --allow-unrelated-histories.

Thanks!

score 20 · Answer 1 · answered Feb 23 '19 at 03:14

You're right that your diagram (from git-scm.com/docs/git-merge) shows a merge that has a common ancestor commit. It's worth noting that this is a shared commit; the term branch is kind of tricky, in Git. (See What exactly do we mean by "branch"?)

Anyway, I think it helps if you forget that git pull even exists. All git pull does is run two different Git commands for you. You're better off, until you are well-experienced with Git, using separate git fetch and git merge commands. (Note that git pull --rebase switches the second command to git rebase, but we won't get into details here.) There are several issues with using git pull to run the other two commands. One of them is that git pull uses a weird pull-only syntax, different from all other Git commands, including the git merge that git pull runs. That is, instead of git pull origin xyz, you'll run git merge origin/xyz. To see what that is, you'd run git log origin/xyz, or git show origin/xyz, etc. These are always spelled origin/xyz, with the slash, except when using git pull—so don't use git pull. :-) Let's break it into the two separate commands.

The first command git pull runs is git fetch, which you can run any time you like: git fetch calls up some other Git, asks it what commits it has for you, under what names (typically branch and tag names). It collects those commits (and their files of course), and for each of their branch names, creates or updates your remote-tracking names. So that's where origin/master comes from, for instance: git fetch sees that they have a master, that their master is commit badf00d or whatever, and creates or updates your origin/master to remember: origin's master was badf00d the last time I checked.
The second command that git pull runs for you is where all the interesting action is. This second command should not be run at any old time, on any old branch, because whichever second command you have Git run, this one has to be on the right branch: the one you want to merge into, or the one you want to rebase. I find using the separate commands helps here, because it's clearer that git merge is going to affect the current branch, even though you'll name something like origin/master.

Now that we know --allow-unrelated-histories is really an option to git merge, let's dive into git merge, and see what it does. First we'll look at what it does with a common starting point, then again at what it does without one.

Merge is about combining changes since a common starting point

Consider the diagram you quoted above, which I'll redraw just a little bit:

     A--B--C   <-- topic
    /
D--E--F--G   <-- master (HEAD)

This indicates that whoever has been working on topic, they started by checking out commit E. Probably, at that time, commit E was the last commit on master:

D--E   <-- master, topic

Since then, someone added two commits on master, which are F and G, and someone—probably someone else—added three commits on topic, which are now the A-B-C chain (with A's parent commit being E).

Each commit represents a complete snapshot of all source files. So commit E has all the files in it—well, all the files it had when you, or whoever, made commit E—saved in that form, forever. Any changes you, or whoever, made to any files from that saved state and saved in, say, commit A, cause those files to be in their new state in A. Any unchanged files in A simply exactly-match the files in E.

For simplicity, we'll assume there are two people acting here, "you" and "they", and you made the changes on master, eventually resulting in commit G. They then made A through C. So you and they both started with whatever is saved forever in commit E. You ended up with what you have saved forever in G. So Git can find out what you changed by a simple git diff, to compare commit E to commit G. Likewise, they ended up at C, so Git can find out what they changed by a similar simple git diff, comparing E vs C:

git diff --find-renames hash-of-E hash-of-G: what you changed
git diff --find-renames hash-of-E hash-of-C: what they changed

Git then checks out the files from commit E, i.e., what you both started with, combines your changes to those files, and builds a new commit with the combined changes in them. That determines what files/contents go into commit H:

     A--B--C   <-- topic
    /       \
D--E--F--G---H   <-- master (HEAD)

New commit H's first parent is G, which was the tip of master and is the branch you have checked out. Its second parent is C, the one you told git merge to merge.

Note that when Git does all this change-combining, it has an easy job on all the files that are exactly the same in both branch-tips, because no matter what was the case in the merge base, the two tips match, so both files are the same and either one works fine. It also has an easy job if you changed file X and they didn't, and where they changed file Y and you didn't, because again, it can just take your or their version of those files. It's only where you both touched the same file, in different ways, that Git has to work hard.

Unrelated histories

Unrelated histories occur when there's no common connection between the two sets of commits:

A--B--C   <-- master (HEAD)

J--K--L   <-- theirs

Your commits start at C and work backwards, ending at A. No commits come before A: A has no parents.

Their commits start, in this case, at L (I skipped a lot of letters to leave room to insert our merge). L's parent is K, and K's parent is J, but J has no parents either. So there's no common starting point at all.

If you tell Git to merge these, Git just pretends there is one. The pretend starting point has no files. Git runs:

git diff empty-tree hash-of-C: what you changed
git diff empty-tree hash-of-L: what they changed

Of course, what you changed, from this diff, is that you added every file (that's in your commit C). What they changed is that they added every file (that's in their commit L).

If the files have different names, they are different files and there is no problem: Git takes yours, or theirs. If they have the same names, but the exact same contents, there's no conflict here either: Git can just take yours (or theirs). The problems occur for all files where yours and theirs have the same name, but different contents. As far as Git is concerned, you whipped yours up from scratch, and so did they, so everything conflicts. You must pick the winning contents or construct a new file from the "everything conflicts" inputs.

Once you have resolved any of these conflicts and run git merge --continue to make Git finish, Git makes a merge commit as usual:

A--B--C--D   <-- master (HEAD)
        /
J--K---L   <-- theirs

The new commit has two parents, C and L, and saves, forever, the snapshot that you built by fixing the conflicts that Git reported, and otherwise whatever files were exactly the same in C and L or that were only in C, or only in L.

("Forever" is a bit too strong: the saved files last only as long as the commit itself. However, the default is for each commit to live forever. If you make the commit go away, the files do too.)

score 2 · Answer 2 · answered Feb 23 '19 at 02:43

2

Merging across unrelated histories works like this: You imagine there's a common ancestor before the root of each history, with no content at all. That is the merge base.

answered Feb 23 '19 at 02:43

Mark Adelsberger

42,148
4
35
52

How exactly does git pull --allow-unrelated-histories work?

2 Answers2

Merge is about combining changes since a common starting point

Unrelated histories