1

I have 3 git repo

  1. A central one located at the server and shared with the team

  2. An intermediate one at my own server and used by me only (cloned from the 1.central one)

  3. A local one at my desktop for my daily work (cloned from the 2.intermediate one). In this case, the local one keeps updated with the central one through intermediate one.

I am now in a case that: I am working on the intermediate repo with a new branch (let call it, "New" branch) while at the same time working on the local repo for another task at the "master" branch.

The problem is that:

When I pull at the intermediate repo, it can get update from the central repo.

But when I pull at the local repo, it cannot get those update from the central repo through the intermediate repo.

It seems that I need to checkout to "master" branch at the intermediate repo before I can get those commits at "master" branch from the intermediate repo to the local repo.

However, it will change the intermediate repo from "New" branch to "master" branch, which I do not want because the "New" branch is a bit behind, changing it means I need to rebuild everything.

Is there any way for the local repo get the updated commits at "master" branch from the intermediate repo without changing the working branch there?

Sure, one way I can think of at the moment is to point the local repo to the central one directly But I do not want to do it at the moment, because there is authentication issue from the local repo to the central directly.

Any help is highly appreciated.

kzfid
  • 688
  • 3
  • 10
  • 17

2 Answers2

1

TL;DR

You have two straightforward options here:

  • make the intermediate system provide a mirror clone, or
  • modify the way you use git fetch in repository #3.

There are additional options; you can get as fancy as you like.

Long

When I pull at the intermediate repo, it can get update from the central repo.

Right.

But when I pull at the local repo, it cannot get those update from the central repo through the intermediate repo.

Also right—but there is a rather important wrinkle here.

Is there any way for the local repo get the updated commits at "master" branch from the intermediate repo without changing the working branch there?

Yes. The trick is to stop using git pull. If you use git fetch, you gain much more control, while giving up a slight bit of convenience.1


1Technically, it's possible to keep using git pull here, but I think it's probably unwise.


How to understand all of the above, and then use it

First, remember that git pull is just a Git convenience command that means:

  1. run git fetch; then
  2. run a second Git command, after the fetch works.

It's the second Git command that updates the branch in the Git repository at which you run git pull, but it's the first command that actually gets new commits.

The next thing to keep in mind is this:

  • Each repository is independent of the other repositories, except in terms of which ones it "talks to". Your intermediate repository (#2) talks to the central repository (#1) by calling up the central repository and asking it what new commits and/or branches that #1 might have. That is, repository #2 "calls" repository #1 over the internet: #1 answers and information flows from #1 to #2. Then, repository #2 "answers the call" when repository #3 "calls it up" over the internet, and information flows from #2 to #3.

  • Each of these operations is a git fetch. You run the git fetch operation from one of the numbered repositories, and it calls up the next-lower-numbered repository and asks the next-lower repository about its branches (and tags) and any new commits it has because of those branch and tag names.

  • Each repository has its own branch names, independent of those in any other repository.

  • Because repository #3 is (currently) asking #2 only about #2's branch names, this forces you to update #2's branch names before #3 sees new commits.

  • Once we fix this issue—there are multiple ways to fix it—note that #3 still only gets information from #2, so #2 will still need to call up #1 to get #2 updated before you can update #3 from #2. That's probably an acceptable trade-off given your constraint that #3 cannot reach #1 directly due to some sort of authentication issue.

Git is really all about commits

Before we get into branch names—and whether or not you need to use them here—let's first note that Git doesn't really care that much about the branch names. It does use them, and of course humans use them—which makes them really important—but what Git cares about are the commit hash IDs. Every commit has a unique hash ID, and every Git agrees that that commit gets that particular hash ID. So any two Gits can tell whether they have the same commit, just by comparing the hash IDs.

What a branch name is and does, in Git, is a simple pairing of a name such as refs/heads/master or refs/heads/develop or similar with one (1) hash ID. These names usually have the refs/heads/ part omitted, because for a name to be a branch name, it must start with refs/heads/. It's therefore redundant to say "the branch name refs/heads/master", for instance. All we need is either "the name refs/heads/master"—which is a branch name—or "the branch name master".

Bare and non-bare repositories

A Git repository can be either bare or non-bare. The usual default, and the way we mostly use most Git repositories, is the non-bare style. All Git repositories, bare or not, contain commits (and their associated files) and branch names (each of which stores exactly one hash ID).

The files stored inside a commit are in a special, Git-only, frozen (read-only) and de-duplicated and compressed form. These files cannot be used by most programs on your computer. For a bare repository, this is not a problem, because a bare repository exists only to store the commits (and the branch names and other such names). We don't normally access the files stored inside the commits, in this kind of repository: we just let the repository handle git fetch and git push operations, i.e., talk to other repositories. In other words, no human actually uses a bare repository for everyday work.

In a non-bare repository, however, we'd like to let a user actually do some real work. To do real work with real files, we have to convince Git to extract the frozen committed files into a useful form. The usual command to do this is either git checkout (Git versions predating 2.23) or git switch (Git 2.23 or later—but git checkout still works, so there is no need to change your personal habits with a newer Git). This gives us usable copies of the files, so now we can read them, change them, and even make new commits. I won't go into any further detail about that here: the point is that a non-bare repository is one a human would normally use.

Mirror clones

I'm going to mention this now, because a standard mirror clone is a special case of a bare repository. If you never do any work in repository #2, your simplest solution will be to convert repository #2 to a mirror clone. The easy way to do this involves removing the repository, and re-cloning. If you take this approach be very sure you have nothing unsaved before removing the clone.

To make a bare mirror clone, use git clone --mirror, in more or less the same way as you would normally use git clone. The resulting repository is one that you cannot do any work in. It has, however, the nice feature that running git fetch in this mirror clone will immediately make it match the original source repository.

What this means is that if you replace repository #2 with a mirror clone, you can do:

ssh intermediate-machine 'cd path/to/clone && git fetch'

to update the clone at #2 to make it match the clone at #1. After this you can git fetch from #3 to #2.

One other minor downside, besides the obvious "can no longer do any work on #2" issue, is that to do a git push from #3, you need two steps. On clone #3, you will run git push origin somebranch as usual, but then you will need to run ssh intermediate-machine cd path/to/clone && git push origin somebranch afterward. If you accidentally run ssh intermediate-machine cd path/to/cone && git fetch, this fetch will discard the commits that #3 sent to #2. (You will still have them on #3 and can re-push, so this is not too awful.)

What if a mirror clone is not feasible?

If system #2, with repository #2, is one where you do need to work, you cannot use a mirror clone here. However, you do have another option: you can create a second clone on system #2 that is a mirror clone, and use that for transfers to repository #3.

If you don't want to do that (for whatever reason: perhaps it just gets too confusing) there are still more options, and to understand them, it's time to discuss more precisely just how git fetch works.

Remote-tracking names

Branch names are not the only kinds of names in a Git repository. You are probably already aware that Git repositories can have tag names as well. A tag is just a name whose full spelling starts with refs/tags/. Like branch names, tag names simply hold one (1) hash ID. The chief difference between a branch name and a tag name is that a branch name always holds the hash ID of the last commit on the branch. This means that the mapping from branch name to commit hash ID is changes over time. We not only expect this to happen, we demand that Git update our branch names automatically: when we check out (or git switch to) some branch, then make a new commit, we have Git automatically store the new commit's hash ID into the branch name, so that the branch name holds the hash ID of the last commit.

Tag names, by contrast, will ideally never change: the hash ID stored in a tag name should remain the same forever. (Some people sometimes violate this rule, sometimes even on purpose. If you do violate this "never change a tag" rule, some Git repositories won't pick up the new hash ID, and the situation can get confusing. But we won't go into details here either.)

The git fetch command, however, adds a third group of names into the mix. Git calls these remote-tracking branch names but I find that the word branch here is just a big distraction. These are not actually branch names—you cannot get on a remote-tracking name, the way you can get on a branch—so I just call them remote-tracking names.

The way a remote-tracking name works is remarkably simple. Your Git is going to call up another Git: for instance, your repo #2 will call up repo #1, or your repo #3 will call up repo #2. When your Git calls up that other Git, that other Git lists out its branch names and commit hash IDs. It says things like my master is commit 4a0fcf9f760c9774be77f51e1e88a7499b53d2e2, for instance.

Your Git then checks: do I have commit 4a0fcf9f760c9774be77f51e1e88a7499b53d2e2? If not, your Git gets that commit—and any other commits your Git might be missing—from their Git. So now your Git does have 4a0fcf9f760c9774be77f51e1e88a7499b53d2e2.

Having obtained the commits, your Git now remembers the hash IDs that their branch names remember by storing their branch name IDs into corresponding remote-tracking name IDs. That is, if their master is 4a0fcf9f760c9774be77f51e1e88a7499b53d2e2, and you call their Git origin, your Git now sets your origin/master to 4a0fcf9f760c9774be77f51e1e88a7499b53d2e2 too.

What this means is that your remote-tracking names remember the hash IDs that their branch names remember. Your origin/master on repo #2 corresponds to origin/master on repo #1.

The full names of these origin/* names are actually refs/remotes/origin/*. Running git fetch in repo #2 calls up repo #1, obtains new commits if needed, then updates all of repo #2's refs/remotes/origin/* names.

The git fetch command can see remote-tracking names

When you run git fetch from repo #3 to repo #2, repo #3 can see repo #2's current refs/remotes/origin/* names. But repo #3 normally gets these names, then tosses the name-and-hash-ID pairings aside. Repo #3 isn't interested in #2's remote-tracking names, but only in its branch names. What if we could get repo #3 to use #2's remote-tracking names?

And in fact, we can. To do this, we need to tell repo #3: look at #2's refs/remotes/origin/* names. When we do this, we can have repo #3 pick up new commits from repo #2's origin/* names. But we must now update some name in repo #3.

We could tell repo #3 to update its origin/* names based on #2's origin/* names. If we do this, we have to tell repo #3 to stop updating its origin/* names based on #2's branch names, because those origin/* names need to be based on one set of names, not two.

If we choose to do this, we automatically update repo #3's remote-tracking names from repo #2's remote-tracking names, but give up one particular way to easily obtain new commits that are only on repo #2, i.e., are not yet pushed to central repository #1. It's still possible—in fact, using git pull can do it easily—but it can get a bit messy. So be very sure about this if you choose this alternative.

There is another option. We can declare that repo #2's refs/remotes/original/* remote-tracking names should be copied to repo #3's refs/remotes/central/* names. That is, we make up a new set of names, central/*, and use those as if we had a way to connect directly from #3 to #1. We don't actually connect directly to #1, but we update names in #3 as if we had.

To make either of these options work automatically, you will need to update the default fetch refspec in repository #3.

Fetch refspecs

Earlier, we said that when you run git fetch, your Git calls up another Git, gets information from that Git about their branch (and other) names, and gets new commits and updates your remote-tracking names. This is all true, but it's time now to go into how this works, in much more detail.

In that first step, when your Git gets information from their Git, they simply list out their various names and hash IDs. So your Git sees that they have, for instance, a refs/heads/master with its hash ID, and a refs/remotes/origin/master with another hash ID.

By default, your Git has this fetch refspec configured into it:

+refs/heads/*:refs/remotes/origin/*

This thing—this refspec—tells your Git: when their side lists branch name master, I want you to create or update my corresponding origin/* name. That makes your Git get their master commits (if they have new ones), then create or update your origin/master.

There are four parts to this particular refspec:

  • The leading plus sign + means force this update. It is roughly equivalent to using --force on the command line, except that it applies only to refs—branch names, tag names, and remote-tracking names—that are being updated by this particular pattern.

  • The refs/heads/* pattern matches all of their branch names, and no other names.

  • The colon : character separates their names from our names.

  • The last pattern, refs/remotes/origin/*, makes your Git turn their branch name (refs/heads/...) into your corresponding remote-tracking name (refs/remotes/) for the remote named origin.

If we replace this line with:

+refs/remotes/origin/*:refs/remotes/origin/*

we will make repo #3 update its remotes/origin/* names from repo #2's remotes/origin/* names. This has the drawback mentioned above.

If we add a line, we can add the central/* pattern. It's now time to mention where this line is found.

git config --edit

Using git config --edit will open your preferred editor on the file .git/config. (Alternatively, you can use the git config command to update the file, but for this kind of work, it's generally easier to do in your editor.) Be sure your editor is set to plain-text mode, and to write this file as plain text, the way a .gitignore file would be written.

In this file, you will see:

[remote "origin"]
        url = ...   # some URL here
        fetch = +refs/heads/*:refs/remotes/origin/*

This is how your git fetch knows what to fetch (and the url = part provides the where to fetch from information). If we change this to read:

[remote "origin"]
        url = ...   # some URL here
        fetch = +refs/heads/*:refs/remotes/origin/*
        fetch = +refs/remotes/origin/*:refs/remotes/central/*

then a simple git fetch from repository #3 will create or update central/* names in that repository, based on the current values of the origin/* remote-tracking names in repository #2.

If you cannot make repository #2 a mirror clone, this last is probably the best option.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Thanks, although I get lost in some concepts, but your solution absolutely get me in the right direction – kzfid Sep 07 '20 at 07:53
0

It seems that I need to checkout to "master" branch at the intermediate repo before I can get those commits at "master" branch from the intermediate repo to the local repo.

You do need a local branch on the intermediate repository which will follow the server repo branch of your choice, indeed.

But that does not mean you need to checkout said branch in your intermediate repository (since you are working on "New"). As explained in "Git pull without checkout?", you can just fetch it:

git fetch origin master:master

Then your local repo can fetch from the intermediate one, and its origin/master will reflect the updated history of the central repo.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250