Moving branch from one remote repo to another

Question

How do I move a branch (C) from one remote repo (A) to another remote repo (B)? Both master branches of the repos contains contains 2 initial commits: a README commit made while creating it on Github, another when creating an Angular project, so the directory structure is pretty much the same.

       C1--C2--C3--C4                 C1--C2--C3--C4
      /                  TO          /
A1--A2                         B1--B2

I tried using git remote set-url origin URL followed by a git pull origin master --allow-unrelated-histories. This lead to a few merge conflicts, all of which were resolved by accepting the incoming changes. But upon using git status it says the local master branch is ahead of origin/master is ahead by 3 commits, as well as own branch (C) being ahead of origin/C by 3 commits when it doesn't even have a corresponding branch in the remote repo yet.

I used git reset --hard HEAD~1 on the master branch to revert to the commit made before merging, so I can merge it properly this time.

What is causing this problem, and how can it be rectified? Also please tell me if you need more info, I'm new to git.

score 1 · Answer 1 · answered Sep 30 '20 at 13:15

1

The simplest answer is that you should force-push the old branch into the new repository so that you get the exact same branch (content and history) into the new repo. That is what I would do, for sure.

answered Sep 30 '20 at 13:15

eftshift0

26,375
3
36
60

score 1 · Accepted Answer · answered Sep 30 '20 at 23:08

I tried using git remote set-url origin URL ...

There's nothing fundamentally wrong with this command, but it can lead to surprising effects, especially to someone new to Git.

followed by a git pull origin master --allow-unrelated-histories.

Again, there's nothing fundamentally wrong with this, but you're now diving in at the deep end of Git, with this --allow-unrelated-histories option.

This led to a few merge conflicts, all of which were resolved by accepting the incoming changes.

(The surprise here is that you did not get massive add/add conflicts. Or maybe you did, and went ahead with the "accept incoming changes" option by taking the entire new file.)

But upon using git status it says the local master branch is ahead of origin/master is ahead by 3 commits, as well as own branch (C) being ahead of origin/C by 3 commits when it doesn't even have a corresponding branch in the remote repo yet.

This is the result I would expect (well, the exact number ahead depends on several other things and does not match your drawings). The explanation for this is ... a bit complex.

Git is about commits

Those new to Git often think Git is about files or branches, but it's not: it's about commits. The role of files is that they are contained in commits, and the role of branches is to help you, and Git, find commits, but ultimately, it's all about the commits. A Git repository consists of two key-value databases, one indexed by hash IDs and one by names, plus a bunch of auxiliary data; the (normally) bigger database holds the commits and other Git internal objects. Hence it's important to memorize what a commit is and does for you, and how you name any particular commit:

Each commit is numbered. These aren't simple counting numbers: we don't have commit #1 followed by #2 and #3 and so on. But each commit has a unique hash ID, which is a number expressed as a big ugly hexadecimal number.
The hash ID of a commit, or any other internal Git object, looks random, but is actually a cryptographic checksum of the object's content. This cryptographic checksum trick has several consequences, but right now we'll consider just one of them: the content is literally impossible to change. If you take one of these out of the big database, make some changes to it, and write that back, what you get is a new object, with a different hash ID. So you can copy an old commit to a new (and different) one, but you can't change an existing one: the existing one still resides in the database, with its old content.
A commit has two parts: it has data, containing a snapshot of all the files that Git knew about when you (or whoever) made the commit, and it has metadata, which holds information about the commit itself, such as who made it, when, and why (the log message). Crucially for Git itself, the metadata include the hash ID of the previous commit. Git calls this the parent of the commit. (Merge commits contain two or more parent hashes, as we'll see in a moment, and at least one commit—the very first one ever—has no parent.)

Branch and other names

Having to memorize big ugly hash IDs is a terrible job, but fortunately, we don't have to do that: we have a computer and it can do that for us. This is what branch names, tag names, and other names do. Each one—each name—holds one hash ID.

You might wonder what good it is to remember just one hash ID. Shouldn't a branch name hold the hash ID of all the commits in the branch? But it doesn't. Instead, it holds the hash ID of the last commit in the branch.

Remember that each commit itself holds the hash ID of the earlier commit. So if we have a nice simple chain of commits, starting with the very first one in the repository, we can draw that:

A <-B <-C   <--branch

Here, the branch name branch holds the hash ID of the last commit, whose big ugly hash ID we're representing with the letter C. Commit C itself holds the hash ID of the previous commit B, and B hold the hash ID of the first commit A. Since A is the first commit—Git calls it a root commit—the chain ends here.

If we add a new name, such as develop, we'll make it point to existing commit C, like this:

A--B--C   <-- branch, develop

All three commits are now on both branches.

Now we need some way to remember which name we're using, so we'll have Git attach the special name HEAD—written in all capitals like this—to one of the branch names. No matter which of these two names we use, we'll be working with commit C, for the moment. Commit C will be the current commit. Let's make develop the current name, though:

A--B--C   <-- branch, develop (HEAD)

Now let's make a new commit. Not minding how this works, yet, we just know that new commit D gets a new, unique, big ugly hash ID, and its parent is the current commit C. So let's draw in commit D, on a line by itself:

A--B--C
       \
        D

Git writes D's new hash ID to one branch name: the current branch. So now we can draw in the branch names:

A--B--C   <-- branch
       \
        D   <-- develop (HEAD)

The current commit is D, and the current branch is develop, and the name develop selects the last commit on this branch. The three earlier commits are still on both branches; new commit D is only on develop.

If we now run git checkout branch, to select the old branch, we get:

A--B--C   <-- branch (HEAD)
       \
        D   <-- develop

No commit has changed, and the branch names still point to the same commits as a moment ago, but now branch is the current branch and C is the current commit. So if we make a new commit E now, we get:

        E   <-- branch (HEAD)
       /
A--B--C
       \
        D   <-- develop

Our two branches have diverged. Commits A-B-C are still on both branches, and each branch now has one commit unique to that one branch. At this point, it makes sense to run git merge.

A typical real merge

Merging is mostly about combining work since a common starting point. Let's start with a slightly different setup, so that I can use the letter M for the merge commit:

          I--J   <-- branch1 (HEAD)
         /
...--G--H
         \
          K--L   <-- branch2

and run git merge branch2. Git uses HEAD to find the current branch and the current commit, which is commit I; it uses the name branch2 to find the other commit, L; and now it's ready to start the merge operation.

Now, remember that each commit holds a snapshot, not differences. To find out what we did, Git has to compare two commits. What commits should we compare? We could compare I vs J, and H vs I, to get our changes in two steps. But it might be simpler just to compare H vs J directly, to get them in one step.

Likewise, to find out what they did, Git has to compare two commits, too. What if Git just compares H to L? That will, in one fell swoop, figure out everything they did.

Why did we pick commit H? Well, it's on both branches. So is commit G, of course, but H probably seems better, because it's later along. Technically, what Git does to find H is to use a lowest common ancestor algorithm on the commit graph—the set of vertices-and-edges we've been drawing here—to find commit H, but mostly we don't really have to care about this detail because it's visually obvious which commit to use. (Real commit graphs tend to be a lot messier, and you can use the git merge-base command to have the computer do the job, if you want to find merge bases. But git merge does it all for you.)

Anyway, Git will:

find the merge base, in this case commit H;
figure out what we changed: base vs HEAD, i.e., commit J;
figure out what they changed: base vs the commit we specified with branch2, i.e., commit L;
combine these changes, applying the combined changes to the snapshot from the merge base; and
make a new commit on its own.

The new commit is a commit, like any other commit, so it has a snapshot and parents. The snapshot for this new commit is the one Git builds by combining the changes and applying them to the merge base commit's snapshot. The first parent of the new commit is the current commit, as it would be for any ordinary non-merge commit. But to mark the new commit as a merge commit, it has a second parent too. That second parent is the commit we named in our git merge command, i.e., commit J.

When Git writes out the new commit, it writes the new commit's hash ID into the current branch name, as usual. So the new commit M becomes the last commit on the current branch, branch1. This means we get:

          I--J
         /    \
...--G--H      M   <-- branch1 (HEAD)
         \    /
          K--L   <-- branch2

with merge commit M holding a snapshot that has our changes—H-vs-J—plus their H-vs-L changes.

Of course, this is not quite what you're doing. But to get to what you're doing, we need to look at what happens when you connect two Git repositories to each other.

Connecting multiple Git repositories together

Preliminary note: git pull means *run git fetch, then run a second Git command such as git merge. We're just interested in the fetch step here: we already covered merge, albeit lightly.

We can have multiple Git repositories. We can, at any time, have any two of them talk to each other. We just have one call up the other, using either git fetch or git push and a URL. The one that gets called has to have a service that allows us to call it, if it's on another machine; for that reason, the "other Git" is often at GitHub, or Bitbucket, or on a GitLab server, or whatever. Except for when you can't get your Git to log in to the other server correctly, these details aren't all that important. You just run git fetch or git push on your computer, where we're already logged in, and have all the control.

Anyway, so, your Git calls up another Git. Your Git and their Git have a little conversation. The conversation is a bit different for fetch vs push, but in general, your Git starts out by asking their Git what branch and other names they have, and what commit hash IDs those names represent. This is where the magic of those cryptographic checksums come in.

Because commit hash IDs are cryptographic checksums, and all Gits compute them the same way, every Git everywhere agrees that for whatever particular unique content some commit has, that commit gets that hash ID. So your Git and their Git can tell if you have the same commits, or not.

With git fetch, your Git can see what commits they have, that you don't. Your Git then asks for those commits, by their hash IDs. When a Git offers a commit and the other Git accepts, the sending Git—in this case, theirs—must offer that commit's parent commits, too. So your git fetch gets every commit they have that you don't. These all go into your repository.

If their commits build on yours, the way our commit D built on our A-B-C chain, or the way M built on our J-and-L, then we just add these to our repository. For instance, if you have:

...--G--H

and they offer J which connects to I which connects to H, you get:

...--G--H   <-- your-branch
         \
          I--J

Here, though, things get a little bit tricky. What if you have ...--G--H as above, and they offer J which connects to I which connects to G? Then you get:

...--G--H   <-- your-branch
      \
       I--J

If your Git wrote J's hash ID into the name your-branch, you would lose your commit H. It would still be in your Git, but how would you find it? If you memorized H's hash ID, you could find it that way, but, ugh. And if your-branch is your current branch, there would be even more problems, which we won't get into here.

So, what your Git does with a git fetch is to create or update a remote-tracking name. Suppose that they also call this branch your-branch. Then you get:

...--G--H   <-- your-branch
      \
       I--J   <-- origin/your-branch

in your own repository. You can now see that you are one commit ahead of them (commit H), and they are two commits ahead of you (commit J).

The name origin here is from the remote, the short name that we use to have Git remember the URL. Git adds the remote name as a prefix, and a slash, to their branch name, so that your Git remembers their your-branch as your origin/your-branch.¹

You might, now, choose to combine work, using git merge for instance. In fact, it's extremely sensible to run git fetch, then look at what git fetch fetched, and then choose to run another Git command. That's exactly what git pull does ... well, except for the fact that it prevents you from looking: you commit to running git merge immediately after git fetch regardless of what came in. That's not really fatal, since you can undo a merge, but I for one don't like it. I like to keep my git fetch separate from any subsequent command, so that I can look first. But that's a matter of taste.

¹I'm skipping over a lot of fine detail to keep this answer short. OK, short-ish. Short-er than it would be otherwise? Less long?

Now we are ready to look at what you are doing

Here, you ran:

git clone some-repository

which got you two commits² and two names:

A--B   <-- master (HEAD), origin/master

The name origin held the URL for the repository you just cloned. Both your master—which your git clone created to reflect their master—and their master, which your Git remembers as origin/master, select the second commit B.

Next, you ran git remote set-url origin new-url. This left everything else unchanged, but set the URL associated with the short name origin. Then you ran git fetch origin.³

This time, your Git called up a completely different Git, and they offered commit D, found by the name master. Your Git wanted that one—you didn't have it yet—and they offered C, which your Git also took, but C has no parent. It's a root commit! So your Git was satisfied that it got everything, and your Git put these commits into your repository and updated your origin/master:

A--B   <-- master

C--D   <-- origin/master

Now it's time to combine work. Your git pull ran, in effect, git merge origin/master. There's a bit of a problem, though: there is no shared commit. To get around this problem, you added --allow-unrelated-histories.

What this option does is pretend that behind each root commit, there is a totally-empty commit. So the "history" becomes, temporarily for the one git merge command only:

  A--B
 /
ε
 \
  C--D

where ε has no files at all. That means the difference between the merge base and commit B is that all files are added, and the difference between the merge base and commit D is that all files are added. Any file that has the same name is considered the "same file", and you get an "add/add conflict".

You resolved these conflicts manually and made a new merge commit E. Let's go back to the original diagram and draw in merge commit E:

A--B--E   <-- master
     /
 C--D   <-- origin/master

The snapshot in merge commit E is the result of the combining, with any fixing-up that you did (and any other changes you might want to sneak in as an "evil merge", if you like). Its parents are commits B and D.

Count, now, how many commits are on your master that are not on origin/master: that's 3. So you are three commits ahead of them.

²You said one commit, but based on what you saw as output, there must be two. I'm also guessing about the branch names. Perhaps they had master and branch-A with two commits: root commit A on both branches, and the tip of master, and commit B on branch-A, with parent A. With more information, I could draw this more precisely, but with luck this will suffice.

³You ran git pull, rather than git fetch, but we need to watch each step, so we'll break this up into fetch and the merge.

If you ran `git push` now...

If you were to run git push origin master, your Git would offer to origin commit E, since E is the last commit on your master. They obviously do not have it, so they would ask for you to send it. Then your Git would be obligated to offer B and D. They already have D but would ask for B, so your Git would offer A, and they would take that too. Your Git would be done sending them commits.

The last part of a git push consists of a polite request or command (or more than one request or command), asking their Git to set one or more of their names to the last-commits that your Git sent. So your Git could ask them to set their master to commit E. When using a polite request (rather than git push --force, which sends the command), they first check to make sure that obeying the request won't lose them any commits.⁴ In this case, that test passes: E comes after D, so their master will advance, and their repository's branch names will now match yours and they will have all the same commits that you have.

If this is the result you want, all is well. If not, you'll need to make some different commit(s), and/or perhaps force the other Git repository to forget some of its commits.

⁴Remember the example above with git fetch, where setting your branch name would lose your commit H. Technically, what they check is that the commit to which their branch name resolves is an ancestor of the commit you're suggesting they should remember instead. If this is-ancestor test passes, they permit the push. If not, they reject it as a non-fast-forward.

Thanks for the long explanation, I learnt quite a lot from it. Also, I've edited the question to clarify the commit history diagram, but your assumption was correct. One thing I still don't get is why does it say my "C" branch is ahead by 3 commits? The local branch was ahead of the original remote by 3 commits (C1 was on remote), but after changing the remote URL, it still says that. Is it because the origin/C pointer on my local repo still points to C1 since that was the previously pushed commit, with `set-url` just changing the value of origin? — ATK, Oct 01 '20 at 01:50
@Arun_TK: to your last question: yes, that's exactly right. The `set-url` operation didn't change the `origin/master` name, or any commits, it just changed the stored URL. A `git fetch origin` will force your Git to update all of its `origin` remote-tracking names, though you need to add `--prune` to this command to make it delete any leftover ones (I personally think this kind of pruning should be the default here but it's not). — torek, Oct 01 '20 at 02:30