TL;DR
Given what you're doing—combining repositories in which someone has re-copied all the commits to new hash IDs—this is normal. It's also essentially unrecoverable, which is why using git filter-branch
to rewrite all of history is somewhat problematic.
Long
"Unrelated histories" means just that: there are two histories—two collections of commit in a Git commit graph—that are not linked with each other. The key to this is understanding how the Git commit graph works.
The history, in a Git repository, is (are?) the commits. Each commit has a hash ID; this is, in a very real sense, the "true name" of the commit. What's actually in the commit itself is rather small. Here is a commit from the Git repository for Git itself:
$ git cat-file -p HEAD | sed 's/@/ /'
tree 4ec41fbdfd4e9569fceb3e25d4c1945f1944af0e
parent e66e8f9be8af90671b85a758b62610bd1162de2d
author Junio C Hamano <gitster pobox.com> 1528116101 +0900
committer Junio C Hamano <gitster pobox.com> 1528116101 +0900
Git 2.18-rc1
Signed-off-by: Junio C Hamano <gitster pobox.com>
The hash ID of this commit is 3e5524907b43337e82a24afbc822078daf7a868f
. No matter who has any Git-repository-for-Git commit, if they have this commit, they have that big ugly hash ID, and no other hash ID. If they have this hash ID, what it represents is this commit, and no other commit. But look at the second line of the commit contents, which say parent another-big-ugly-hash. This hash ID identifies another commit in a Git repository for Git; my copy of this Git repository has this commit in it, too. This parent commit has another hash ID—well, two, because it's a merge commit—and those commits have hash IDs for their parents, and so on.
If we draw these as a graph, with arrows coming out of each commit pointing to its parent, we get something like this—well, let's use a tiny, three-commit repository here:
A <-B <-C
Git needs to know the last hash ID; this is where branch names come in:
A <-B <-C <--master
Git uses the last hash ID, found by the branch name, to find each tip commit. That commit has a parent ID, which Git uses to find another commit, which has a parent ID, which Git uses again, and so on. The action stops when Git reaches a commit like our commit A
, which has no parent ID, because it's the end of the graph. These commits are called root commits.
When we add more commits, and link all these up, we get something a bit more complicated, such as this:
o--o--o---o--o <-- master
\ /
o--o
We don't need the internal arrows because we know they always point backwards: child commits know their parents, but parent commits don't know their children.
In a big repository, we get a really big graph. But sometimes, depending on how we build our graph—especially if we use git add <remote>
and git fetch
—we can get repositories with more than one root commit. For instance, within our tiny three-commit repository, we might bring in another repository with, say, four commits:
A--B--C <-- master
D--E--F--G <-- other/master
These commits are the history, but now there are two disconnected histories! Starting from C
, we work back to A
, and stop. Starting from G
, we work back to D
, and stop. (Remember, these easy to read and understand single-letters stand in for actual hash IDs, which appear random.)
If you ask Git to merge these, what Git does is temporarily make up a fake pretend commit that has no files in it, and use that as the common ancestor:
*--A--B--C <-- master
\
D--E--F--G <-- other/master
Now the histories join up, at the fake ancestor temporarily pretended-into-existence for the purpose of merging. Git can now diff the empty tree of commit *
against the source tree in commit C
; all the files in commit C
are newly added. Git can also diff the empty tree against the source tree in commit G
, and again, all the files there are newly added.
If these unrelated histories are of commits that mostly contain the same files, the result is a giant set of "add/add conflicts", because the two tip commits add mostly the same files. You can choose to do this, and resolve all the conflicts manually, and then commit. Git drops the fake temporary root commit (actually it never even put it in—the empty tree is present in all Git repositories, so it just uses that directly) and you get:
A--B--C----H <-- master
/
D--E--F--G <-- other/master
and now commit H
relates the two histories, by joining the otherwise disjoint sub-graphs.
Research and compare local and remote repos, both are identical with the curious exception that the author information is different on each commit between local and remote. Not sure why or what the proper fix is.
If the trees are all identical, this suggests someone ran git filter-branch
specifically so as to modify author information. What filter-branch
does is to copy commits, to new commits, after applying some set of filers. If you choose a filter that rewrites the author name in some or all commits, the new copies are different commits—they have different author
lines—so they have different hashes. If this changes the root commit in the repository, then even if no other commit changes, all other copied commits have to record their new (different) parent hash.
For instance, in our small three-commit repository, copying A
but changing the author results in a new hash which we can call A'
:
A--B--C <-- master
A'
When we next copy B, keeping everything the same (even the author), we still have to put A'
's ID into the copy, so that the copy will point back to A'
:
A'-B'
Copying C likewise forces a change, to the parent line if nothing else, giving us:
A--B--C <-- master
A'-B'-C' [just built]
The last thing filter-branch
does is move all the labels to point to the new copies:
A--B--C <-- refs/original/refs/heads/master (to be deleted)
A'-B'-C' <-- master
Once you remove the refs/original/
leftovers to forget the original commits, you're left with a repository in which all the commits have different authors, and therefore different hash IDs, and are therefore different commits.
A repository is a collection of commits indexed by hash ID
Again, the commits are the history. Their hash IDs are what Git cares about. Copy the repository (via cloning) and you copy the commits, using their hash IDs. Copy the repository to new (different) commits via git filter-branch
or similar, and you end up with a new, different repository, with different—possibly even totally-unrelated—history. (The histories will be related if both repositories retain their root commit unchanged.)
Those with the old repository must, in general, abandon their repository in favor of the new one, or decide to ignore the new one entirely. Only use git filter-branch
like this if you know and accept the consequences.