I tried using git remote set-url origin URL
...
There's nothing fundamentally wrong with this command, but it can lead to surprising effects, especially to someone new to Git.
followed by a git pull origin master --allow-unrelated-histories
.
Again, there's nothing fundamentally wrong with this, but you're now diving in at the deep end of Git, with this --allow-unrelated-histories
option.
This led to a few merge conflicts, all of which were resolved by accepting the incoming changes.
(The surprise here is that you did not get massive add/add conflicts. Or maybe you did, and went ahead with the "accept incoming changes" option by taking the entire new file.)
But upon using git status
it says the local master branch is ahead of origin/master is ahead by 3 commits, as well as own branch (C) being ahead of origin/C by 3 commits when it doesn't even have a corresponding branch in the remote repo yet.
This is the result I would expect (well, the exact number ahead depends on several other things and does not match your drawings). The explanation for this is ... a bit complex.
Git is about commits
Those new to Git often think Git is about files or branches, but it's not: it's about commits. The role of files is that they are contained in commits, and the role of branches is to help you, and Git, find commits, but ultimately, it's all about the commits. A Git repository consists of two key-value databases, one indexed by hash IDs and one by names, plus a bunch of auxiliary data; the (normally) bigger database holds the commits and other Git internal objects. Hence it's important to memorize what a commit is and does for you, and how you name any particular commit:
Each commit is numbered. These aren't simple counting numbers: we don't have commit #1 followed by #2 and #3 and so on. But each commit has a unique hash ID, which is a number expressed as a big ugly hexadecimal number.
The hash ID of a commit, or any other internal Git object, looks random, but is actually a cryptographic checksum of the object's content. This cryptographic checksum trick has several consequences, but right now we'll consider just one of them: the content is literally impossible to change. If you take one of these out of the big database, make some changes to it, and write that back, what you get is a new object, with a different hash ID. So you can copy an old commit to a new (and different) one, but you can't change an existing one: the existing one still resides in the database, with its old content.
A commit has two parts: it has data, containing a snapshot of all the files that Git knew about when you (or whoever) made the commit, and it has metadata, which holds information about the commit itself, such as who made it, when, and why (the log message). Crucially for Git itself, the metadata include the hash ID of the previous commit. Git calls this the parent of the commit. (Merge commits contain two or more parent hashes, as we'll see in a moment, and at least one commit—the very first one ever—has no parent.)
Branch and other names
Having to memorize big ugly hash IDs is a terrible job, but fortunately, we don't have to do that: we have a computer and it can do that for us. This is what branch names, tag names, and other names do. Each one—each name—holds one hash ID.
You might wonder what good it is to remember just one hash ID. Shouldn't a branch name hold the hash ID of all the commits in the branch? But it doesn't. Instead, it holds the hash ID of the last commit in the branch.
Remember that each commit itself holds the hash ID of the earlier commit. So if we have a nice simple chain of commits, starting with the very first one in the repository, we can draw that:
A <-B <-C <--branch
Here, the branch name branch
holds the hash ID of the last commit, whose big ugly hash ID we're representing with the letter C
. Commit C
itself holds the hash ID of the previous commit B
, and B
hold the hash ID of the first commit A
. Since A
is the first commit—Git calls it a root commit—the chain ends here.
If we add a new name, such as develop
, we'll make it point to existing commit C
, like this:
A--B--C <-- branch, develop
All three commits are now on both branches.
Now we need some way to remember which name we're using, so we'll have Git attach the special name HEAD
—written in all capitals like this—to one of the branch names. No matter which of these two names we use, we'll be working with commit C
, for the moment. Commit C
will be the current commit. Let's make develop
the current name, though:
A--B--C <-- branch, develop (HEAD)
Now let's make a new commit. Not minding how this works, yet, we just know that new commit D
gets a new, unique, big ugly hash ID, and its parent is the current commit C
. So let's draw in commit D
, on a line by itself:
A--B--C
\
D
Git writes D
's new hash ID to one branch name: the current branch. So now we can draw in the branch names:
A--B--C <-- branch
\
D <-- develop (HEAD)
The current commit is D
, and the current branch is develop
, and the name develop
selects the last commit on this branch. The three earlier commits are still on both branches; new commit D
is only on develop
.
If we now run git checkout branch
, to select the old branch, we get:
A--B--C <-- branch (HEAD)
\
D <-- develop
No commit has changed, and the branch names still point to the same commits as a moment ago, but now branch
is the current branch and C
is the current commit. So if we make a new commit E
now, we get:
E <-- branch (HEAD)
/
A--B--C
\
D <-- develop
Our two branches have diverged. Commits A-B-C
are still on both branches, and each branch now has one commit unique to that one branch. At this point, it makes sense to run git merge
.
A typical real merge
Merging is mostly about combining work since a common starting point. Let's start with a slightly different setup, so that I can use the letter M
for the merge commit:
I--J <-- branch1 (HEAD)
/
...--G--H
\
K--L <-- branch2
and run git merge branch2
. Git uses HEAD
to find the current branch and the current commit, which is commit I
; it uses the name branch2
to find the other commit, L
; and now it's ready to start the merge operation.
Now, remember that each commit holds a snapshot, not differences. To find out what we did, Git has to compare two commits. What commits should we compare? We could compare I
vs J
, and H
vs I
, to get our changes in two steps. But it might be simpler just to compare H
vs J
directly, to get them in one step.
Likewise, to find out what they did, Git has to compare two commits, too. What if Git just compares H
to L
? That will, in one fell swoop, figure out everything they did.
Why did we pick commit H
? Well, it's on both branches. So is commit G
, of course, but H
probably seems better, because it's later along. Technically, what Git does to find H
is to use a lowest common ancestor algorithm on the commit graph—the set of vertices-and-edges we've been drawing here—to find commit H
, but mostly we don't really have to care about this detail because it's visually obvious which commit to use. (Real commit graphs tend to be a lot messier, and you can use the git merge-base
command to have the computer do the job, if you want to find merge bases. But git merge
does it all for you.)
Anyway, Git will:
- find the merge base, in this case commit
H
;
- figure out what we changed: base vs
HEAD
, i.e., commit J
;
- figure out what they changed: base vs the commit we specified with
branch2
, i.e., commit L
;
- combine these changes, applying the combined changes to the snapshot from the merge base; and
- make a new commit on its own.
The new commit is a commit, like any other commit, so it has a snapshot and parents. The snapshot for this new commit is the one Git builds by combining the changes and applying them to the merge base commit's snapshot. The first parent of the new commit is the current commit, as it would be for any ordinary non-merge commit. But to mark the new commit as a merge commit, it has a second parent too. That second parent is the commit we named in our git merge
command, i.e., commit J
.
When Git writes out the new commit, it writes the new commit's hash ID into the current branch name, as usual. So the new commit M
becomes the last commit on the current branch, branch1
. This means we get:
I--J
/ \
...--G--H M <-- branch1 (HEAD)
\ /
K--L <-- branch2
with merge commit M
holding a snapshot that has our changes—H
-vs-J
—plus their H
-vs-L
changes.
Of course, this is not quite what you're doing. But to get to what you're doing, we need to look at what happens when you connect two Git repositories to each other.
Connecting multiple Git repositories together
Preliminary note: git pull
means *run git fetch
, then run a second Git command such as git merge
. We're just interested in the fetch
step here: we already covered merge, albeit lightly.
We can have multiple Git repositories. We can, at any time, have any two of them talk to each other. We just have one call up the other, using either git fetch
or git push
and a URL. The one that gets called has to have a service that allows us to call it, if it's on another machine; for that reason, the "other Git" is often at GitHub, or Bitbucket, or on a GitLab server, or whatever. Except for when you can't get your Git to log in to the other server correctly, these details aren't all that important. You just run git fetch
or git push
on your computer, where we're already logged in, and have all the control.
Anyway, so, your Git calls up another Git. Your Git and their Git have a little conversation. The conversation is a bit different for fetch vs push, but in general, your Git starts out by asking their Git what branch and other names they have, and what commit hash IDs those names represent. This is where the magic of those cryptographic checksums come in.
Because commit hash IDs are cryptographic checksums, and all Gits compute them the same way, every Git everywhere agrees that for whatever particular unique content some commit has, that commit gets that hash ID. So your Git and their Git can tell if you have the same commits, or not.
With git fetch
, your Git can see what commits they have, that you don't. Your Git then asks for those commits, by their hash IDs. When a Git offers a commit and the other Git accepts, the sending Git—in this case, theirs—must offer that commit's parent commits, too. So your git fetch
gets every commit they have that you don't. These all go into your repository.
If their commits build on yours, the way our commit D
built on our A-B-C
chain, or the way M
built on our J
-and-L
, then we just add these to our repository. For instance, if you have:
...--G--H
and they offer J
which connects to I
which connects to H
, you get:
...--G--H <-- your-branch
\
I--J
Here, though, things get a little bit tricky. What if you have ...--G--H
as above, and they offer J
which connects to I
which connects to G
? Then you get:
...--G--H <-- your-branch
\
I--J
If your Git wrote J
's hash ID into the name your-branch
, you would lose your commit H
. It would still be in your Git, but how would you find it? If you memorized H
's hash ID, you could find it that way, but, ugh. And if your-branch
is your current branch, there would be even more problems, which we won't get into here.
So, what your Git does with a git fetch
is to create or update a remote-tracking name. Suppose that they also call this branch your-branch
. Then you get:
...--G--H <-- your-branch
\
I--J <-- origin/your-branch
in your own repository. You can now see that you are one commit ahead of them (commit H
), and they are two commits ahead of you (commit J
).
The name origin
here is from the remote, the short name that we use to have Git remember the URL. Git adds the remote name as a prefix, and a slash, to their branch name, so that your Git remembers their your-branch
as your origin/your-branch
.1
You might, now, choose to combine work, using git merge
for instance. In fact, it's extremely sensible to run git fetch
, then look at what git fetch
fetched, and then choose to run another Git command. That's exactly what git pull
does ... well, except for the fact that it prevents you from looking: you commit to running git merge
immediately after git fetch
regardless of what came in. That's not really fatal, since you can undo a merge, but I for one don't like it. I like to keep my git fetch
separate from any subsequent command, so that I can look first. But that's a matter of taste.
1I'm skipping over a lot of fine detail to keep this answer short. OK, short-ish. Short-er than it would be otherwise? Less long?
Now we are ready to look at what you are doing
Here, you ran:
git clone some-repository
which got you two commits2 and two names:
A--B <-- master (HEAD), origin/master
The name origin
held the URL for the repository you just cloned. Both your master
—which your git clone
created to reflect their master
—and their master
, which your Git remembers as origin/master
, select the second commit B
.
Next, you ran git remote set-url origin new-url
. This left everything else unchanged, but set the URL associated with the short name origin
. Then you ran git fetch origin
.3
This time, your Git called up a completely different Git, and they offered commit D
, found by the name master
. Your Git wanted that one—you didn't have it yet—and they offered C
, which your Git also took, but C
has no parent. It's a root commit! So your Git was satisfied that it got everything, and your Git put these commits into your repository and updated your origin/master
:
A--B <-- master
C--D <-- origin/master
Now it's time to combine work. Your git pull
ran, in effect, git merge origin/master
. There's a bit of a problem, though: there is no shared commit. To get around this problem, you added --allow-unrelated-histories
.
What this option does is pretend that behind each root commit, there is a totally-empty commit. So the "history" becomes, temporarily for the one git merge
command only:
A--B
/
ε
\
C--D
where ε
has no files at all. That means the difference between the merge base and commit B
is that all files are added, and the difference between the merge base and commit D
is that all files are added. Any file that has the same name is considered the "same file", and you get an "add/add conflict".
You resolved these conflicts manually and made a new merge commit E
. Let's go back to the original diagram and draw in merge commit E
:
A--B--E <-- master
/
C--D <-- origin/master
The snapshot in merge commit E
is the result of the combining, with any fixing-up that you did (and any other changes you might want to sneak in as an "evil merge", if you like). Its parents are commits B
and D
.
Count, now, how many commits are on your master
that are not on origin/master
: that's 3. So you are three commits ahead of them.
2You said one commit, but based on what you saw as output, there must be two. I'm also guessing about the branch names. Perhaps they had master
and branch-A
with two commits: root commit A
on both branches, and the tip of master
, and commit B
on branch-A
, with parent A
. With more information, I could draw this more precisely, but with luck this will suffice.
3You ran git pull
, rather than git fetch
, but we need to watch each step, so we'll break this up into fetch and the merge.
If you ran git push
now...
If you were to run git push origin master
, your Git would offer to origin
commit E
, since E
is the last commit on your master
. They obviously do not have it, so they would ask for you to send it. Then your Git would be obligated to offer B
and D
. They already have D
but would ask for B
, so your Git would offer A
, and they would take that too. Your Git would be done sending them commits.
The last part of a git push
consists of a polite request or command (or more than one request or command), asking their Git to set one or more of their names to the last-commits that your Git sent. So your Git could ask them to set their master
to commit E
. When using a polite request (rather than git push --force
, which sends the command), they first check to make sure that obeying the request won't lose them any commits.4 In this case, that test passes: E
comes after D
, so their master
will advance, and their repository's branch names will now match yours and they will have all the same commits that you have.
If this is the result you want, all is well. If not, you'll need to make some different commit(s), and/or perhaps force the other Git repository to forget some of its commits.
4Remember the example above with git fetch
, where setting your branch name would lose your commit H
. Technically, what they check is that the commit to which their branch name resolves is an ancestor of the commit you're suggesting they should remember instead. If this is-ancestor test passes, they permit the push. If not, they reject it as a non-fast-forward.