Git refusing to pull due to "Unrelated Histories", both remote and local repo contain exact same files and changes

Question

I have been looking for more information on this problem for a few hours now. I am cleaning up a few older git repos stored on my PC and making sure they're fully committed and pushed upstream to GitLab before I delete the local copy. Most of the repos, when I attempt to pull, give the error "Fatal: Refusing to merge unrelated histories". Searching this error on Google brings up a few StackOverflow posts suggesting I use "--allow-unrelated-histories" to fix the problem, but that doesn't help me understand why it is occurring in the first place.

I clone one of the smaller repos from GitLab and do a file by file comparison of all of the working files. They are identical. I do the same with another small repo and get the same result. I decide to check the log. The local and cloned copies contain the exact same set of commits, and local has an empty staging area.

This is when I notice that the local and cloned repos have different author information for each commit. Considering everything else is the same, including the commit time down to the second, I can only assume this is the problem. I don't understand why the local and upstream copies of the repo have differing author information. I haven't actively rewritten my local history to my knowledge, and GitLab doing that itself seems destructive.

tl;dr: Git refuses to merge unrelated histories. Research and compare local and remote repos, both are identical with the curious exception that the author information is different on each commit between local and remote. Not sure why or what the proper fix is.

Double check your .gitconfig and make sure your username/email match your gitlab git credentials. Besides that, the easy fix might be to simply blow away your local repos and clone the remote ones if all that is different is author names — rickjerrity, Jun 15 '18 at 17:08
@rickjerrity I've checked and my name matches, but my email does not. This is consistent with the last few local commits, before which I had a nickname as my .gitconfig name. Would the author info get changed to what you have registered with GitLab/GitHub when you push upstream? Also I could wipe local but I'm trying to get a grasp on what I did wrong here first. — Ryn, Jun 15 '18 at 17:24

score 1 · Accepted Answer · answered Jun 16 '18 at 06:34

TL;DR

Given what you're doing—combining repositories in which someone has re-copied all the commits to new hash IDs—this is normal. It's also essentially unrecoverable, which is why using git filter-branch to rewrite all of history is somewhat problematic.

Long

"Unrelated histories" means just that: there are two histories—two collections of commit in a Git commit graph—that are not linked with each other. The key to this is understanding how the Git commit graph works.

The history, in a Git repository, is (are?) the commits. Each commit has a hash ID; this is, in a very real sense, the "true name" of the commit. What's actually in the commit itself is rather small. Here is a commit from the Git repository for Git itself:

$ git cat-file -p HEAD | sed 's/@/ /'
tree 4ec41fbdfd4e9569fceb3e25d4c1945f1944af0e
parent e66e8f9be8af90671b85a758b62610bd1162de2d
author Junio C Hamano <gitster pobox.com> 1528116101 +0900
committer Junio C Hamano <gitster pobox.com> 1528116101 +0900

Git 2.18-rc1

Signed-off-by: Junio C Hamano <gitster pobox.com>

The hash ID of this commit is 3e5524907b43337e82a24afbc822078daf7a868f. No matter who has any Git-repository-for-Git commit, if they have this commit, they have that big ugly hash ID, and no other hash ID. If they have this hash ID, what it represents is this commit, and no other commit. But look at the second line of the commit contents, which say parent another-big-ugly-hash. This hash ID identifies another commit in a Git repository for Git; my copy of this Git repository has this commit in it, too. This parent commit has another hash ID—well, two, because it's a merge commit—and those commits have hash IDs for their parents, and so on.

If we draw these as a graph, with arrows coming out of each commit pointing to its parent, we get something like this—well, let's use a tiny, three-commit repository here:

A  <-B  <-C

Git needs to know the last hash ID; this is where branch names come in:

A  <-B  <-C   <--master

Git uses the last hash ID, found by the branch name, to find each tip commit. That commit has a parent ID, which Git uses to find another commit, which has a parent ID, which Git uses again, and so on. The action stops when Git reaches a commit like our commit A, which has no parent ID, because it's the end of the graph. These commits are called root commits.

When we add more commits, and link all these up, we get something a bit more complicated, such as this:

o--o--o---o--o   <-- master
    \    /
     o--o

We don't need the internal arrows because we know they always point backwards: child commits know their parents, but parent commits don't know their children.

In a big repository, we get a really big graph. But sometimes, depending on how we build our graph—especially if we use git add <remote> and git fetch—we can get repositories with more than one root commit. For instance, within our tiny three-commit repository, we might bring in another repository with, say, four commits:

A--B--C   <-- master

D--E--F--G   <-- other/master

These commits are the history, but now there are two disconnected histories! Starting from C, we work back to A, and stop. Starting from G, we work back to D, and stop. (Remember, these easy to read and understand single-letters stand in for actual hash IDs, which appear random.)

If you ask Git to merge these, what Git does is temporarily make up a fake pretend commit that has no files in it, and use that as the common ancestor:

*--A--B--C   <-- master
 \
  D--E--F--G   <-- other/master

Now the histories join up, at the fake ancestor temporarily pretended-into-existence for the purpose of merging. Git can now diff the empty tree of commit * against the source tree in commit C; all the files in commit C are newly added. Git can also diff the empty tree against the source tree in commit G, and again, all the files there are newly added.

If these unrelated histories are of commits that mostly contain the same files, the result is a giant set of "add/add conflicts", because the two tip commits add mostly the same files. You can choose to do this, and resolve all the conflicts manually, and then commit. Git drops the fake temporary root commit (actually it never even put it in—the empty tree is present in all Git repositories, so it just uses that directly) and you get:

A--B--C----H   <-- master
          /
D--E--F--G   <-- other/master

and now commit H relates the two histories, by joining the otherwise disjoint sub-graphs.

Research and compare local and remote repos, both are identical with the curious exception that the author information is different on each commit between local and remote. Not sure why or what the proper fix is.

If the trees are all identical, this suggests someone ran git filter-branch specifically so as to modify author information. What filter-branch does is to copy commits, to new commits, after applying some set of filers. If you choose a filter that rewrites the author name in some or all commits, the new copies are different commits—they have different author lines—so they have different hashes. If this changes the root commit in the repository, then even if no other commit changes, all other copied commits have to record their new (different) parent hash.

For instance, in our small three-commit repository, copying A but changing the author results in a new hash which we can call A':

A--B--C   <-- master

A'

When we next copy B, keeping everything the same (even the author), we still have to put A''s ID into the copy, so that the copy will point back to A':

A'-B'

Copying C likewise forces a change, to the parent line if nothing else, giving us:

A--B--C   <-- master

A'-B'-C'  [just built]

The last thing filter-branch does is move all the labels to point to the new copies:

A--B--C   <-- refs/original/refs/heads/master (to be deleted)

A'-B'-C'  <-- master

Once you remove the refs/original/ leftovers to forget the original commits, you're left with a repository in which all the commits have different authors, and therefore different hash IDs, and are therefore different commits.

A repository is a collection of commits indexed by hash ID

Again, the commits are the history. Their hash IDs are what Git cares about. Copy the repository (via cloning) and you copy the commits, using their hash IDs. Copy the repository to new (different) commits via git filter-branch or similar, and you end up with a new, different repository, with different—possibly even totally-unrelated—history. (The histories will be related if both repositories retain their root commit unchanged.)

Those with the old repository must, in general, abandon their repository in favor of the new one, or decide to ignore the new one entirely. Only use git filter-branch like this if you know and accept the consequences.

I appreciate the answer, you definitely cleared some things up; especially regarding the root commit and how changing it completely destroys the established history. As it appears that 'git filter-branch' or similar was used to destructively alter the history, I'm going to go ahead and quickly check the remaining repos for consistency with their hosted counterparts before I wipe the local copies and reclone what I need. Thanks for the learning experience! — Ryn, Jun 17 '18 at 03:51

Git refusing to pull due to "Unrelated Histories", both remote and local repo contain exact same files and changes

1 Answers1

TL;DR

Long

A repository is a collection of commits indexed by hash ID