Does git fetch pick up deleted branches?

Question

Use case:

User A's working on branch A forked from master; user B creates branch B forked from master does some work and commits and then deletes branch B. Can user A ever see the work of user B (e.g. git fetch --all). User A has never worked on branch B.

In this case branch B is not showing up for user A.

score 4 · Accepted Answer · answered Oct 19 '21 at 14:46

The short answer is "no".

The long answer is Mu: the question as asked is actually meaningless, unless one makes some leaps of interpretation (which I think most people will). The reason is that branches don't matter; you don't (quite) fetch branches. What matters are commits, so the correct question is whether git fetch will fetch these commits (and then the answer is generally "no" ).

I think you have a mistaken idea here too:

e.g. git fetch --all

The --all option to git fetch means all remotes, not all branches.

The rest of this answer is optional, but I suggest it is worth reading: you'll find out when the answer becomes "yes".

How Git works

We begin with the following:

A Git repository is, at its heart, a pair of databases:
- One database holds commits and other Git internal objects. These store every version of every file for all time, more or less.
- But commits (and other objects) are numbered with huge, useless-to-humans, seemingly-random numbers ("hash IDs" or "object IDs"). To make these things accessible to humans and thus useful, the other databse in a Git repository translates from names to the internal numbers.
The names in a Git repository include branch names, but these are not the only kind of name. There are also tag names, funny things called remote-tracking names or remote-tracking branch names, and so on. People (rather mistakenly) sometimes call the remote-tracking names "remote branch names", but this is quite misleading.
The act of cloning a Git repository means get me all the commits and none of the branches. (This is somewhat modifiable via various options, and does not capture all the details, but it's the right starting point for viewing cloning.) Git does not need branches. It needs only commits and some names by which to find them.

When we work locally, in a Git repository we've cloned or built from scratch or whatever, we do in fact get our work done using branch names. But these branch names are created in our repository. They are not in any other repository! Because humans are humans, though, we tend to use the same names in two different clones:

Bob has a repository. In Bob's repository, Bob created branches named alpha and beta.
I clone Bob's repository. I don't get his branch names: I create my own branch names. But because I intend to work with Bob, I call my branches alpha and beta too.

These are the "same names", and initially they may hold the same commit ID numbers as well. But my names are mine and Bob's names are Bob's. They only meet up if and when we synchronize them.

When I first clone Bob's repository, I get, from him, all of his commits and none of his branches: I have no branches at all. But my Git does remember his branch names. My Git sticks those names into my repository under the general category of remote-tracking names. That is, instead of alpha, I get bob/alpha. Instead of beta, I get bob/beta. These are my Git's memory of Bob's branch names.

Now, since I intend to work on/with the same commit that Bob published recently, I choose one of these two names and have my Git create, for me, a branch of the same name: I now have either an alpha or a beta (but not both). Since any name holds one internal Git object ID, my alpha or beta—whichever it is I choose to create—holds the same commit hash ID as my bob/alpha or bob/beta. That's the hash ID I got from Bob when I got all the commits from Bob, and turned Bob's branch names into my remote-tracking names.

How `git fetch` works

Over time, Bob may or may not have made new commits. At some point, I decide I should have my Git, working with my clone, which has my branches (plus of course all the commits, plus my remote-tracking names), call up Bob's Git again, and have Bob's Git connect to Bob's repository.

At this point, Bob has whatever branches he has. His Git (his software, running on his repository) lists out these branch names to my Git (my software, running on my repository). These come with commit hash IDs: those big ugly random-looking numbers for the commit objects.

My Git checks to see whether I have those commits. If I do, great! If not, my Git asks Bob's Git for those commits, which causes a whole conversation to run so that my Git can find out all the new commits Bob has that I don't. My Git downloads all these commits, and now I have all the commits Bob has, again, just like when I first cloned. Finally, now that I have all of Bob's commits—plus maybe my own, on my branches—my Git updates my remote-tracking names to remember Bob's branch names and commits.

Note that this has no effect on any of my branches. I do, however, get updates to my remote-tracking names—and if Bob created a new branch name, and my Git saw it during this git fetch, my Git will create a new remote-tracking name to go with that. If I set fetch.prune or use -p, and Bob deleted some of his branch names, my Git will delete the corresponding remote-tracking names, too. So git fetch updates, for me, the remote-tracking names for the Git I called up.

The key questions here are: What Git did I call up, and what names and commits did that Git have? I say here that I called up Bob's Git, which had Bob's branch names and all the commits Bob has, so we can answer these questions and see what remote-tracking names I have now, and what object hash IDs those names hold.

Introducing "forks" and/or "central repositories"

In the above, I've been using Bob's computer directly. When I run git fetch, I get ssh access (or whatever) to Bob's computer, logging in to it in some way so that I can run Git commands over there. That's fine in some Linux-server-type environments, like a corporate Git setup. But many places don't want to work like this, and/or want to have a single "source of truth" centralized repository, whether that's hosted in-company or on GitHub or whatever.

So now I won't have access to Bob's repository, on Bob's computer. Instead, there's a centralized repo somewhere that—at least initially—has only one branch, named master. Bob will clone that centralized repo and get origin/master and use that to create, in Bob's Git, master. Bob then uses his master to create a new branch name alpha.

When I connect to the central repository, my Git makes my clone, which has all the commits and no branch names and one remote-tracking name origin/master. I (or my Git anyway) use my origin/master to create a branch named master, which I then use to create my branch name beta.

When I run git fetch, my Git goes over to origin. Bob hasn't told the Git over on origin to create any new branch names. So I won't see any of Bob's branch names at all, because I never talk directly to Bob's Git, and I won't see any of Bob's branch names copied over to origin because he has not done that yet.

When Bob eventually runs a git push, he does:

git push -u origin alpha

This makes his Git call up the Git over at origin and offer to it—to the origin Git—any commits Bob has on alpha that origin does not already have.¹ They take those commits, and then Bob asks the origin Git to create, on origin, a new branch name, alpha. If the origin Git obeys this request—that's up to the origin Git and any control knobs someone may have installed and adjusted (basic Git doesn't have much here, but most hosting sites do)—then now the origin Git has a branch named alpha.

My Git, calling up the Git at origin, can now see alpha, and create my origin/alpha remote-tracking name (after getting those five, or whatever, new-to-my-Git commits). That's a remote-tracking name for me, and a branch name for origin, but I can only see it because Bob convinced origin to create it.

If Bob decides to make a GitHub-style fork, what he's done is make yet another clone, but this time one hosted on GitHub. Bob's clone is another separate Git repository and this clone has its own branch names. There's a special thing or two about this clone though: when GitHub creates it, GitHub does copy all the branches, so initially that clone has all the same branches as the origin clone I'll be using. Also, as Bob creates new commits and branch names on Bob's GitHub fork, Bob can make pull request to the origin Git. (That's all stuff GitHub offers as add-ons, to make you want to use GitHub rather than doing self-hosting.)

In all these cases, until Bob somehow causes a new branch to come into existence on the origin Git, I can't see Bob's commits. I can only see the branch names that are on origin, which will become my remote-tracking names; and I can only get Bob's commits once he's given them to the origin Git somehow, and made a name on the origin Git so that I—or my Git—can find their commit hash ID numbers.

¹This phrasing covers the fact that all the commits that were on master are now on both branches. So the Git at origin has a ton of commits that are on alpha; it's just that Bob has five more commits, or however many Bob made.

Remotes

In the above process, my Git has always had exactly one remote.

When I was using the example where I went directly into Bob's computer—which let me see all of Bob's branches any time I did that—I used the name bob for this remote, so that my remote-tracking names were bob/alpha and bob/beta.

When I was using GitHub as an example, I used the name origin for the remote, so that my remote-tracking names were origin/master and, eventually (once Bob created an alpha there too) origin/alpha.

A remote is primarily a short name for a URL. The URL I might use for Bob's computer might be ssh://bob.company.com/path/to/repo.git. The URL I might use for GitHub might be ssh://git@github.com/company/repo.git.

The git clone command will, by default, make your new clone have, as its (one, single) remote, the remote name origin. This name will store the URL you gave to git clone, so that later, git fetch origin will go back to the same URL and get any new commits from them.

You can, however, have more than one remote. The only constraint here is that each one has to have a unique name. So if I do have direct access to Bob's computer, I can add that to my clone in which origin refers to the GitHub clone ... and now I can access Bob's repository directly, and hence see Bob's branches, as my bob/* remote-tracking names. So now the answer changes from no, I can't see Bob's branches to yes, I can see Bob's branches. I will have origin/master, but also bob/alpha (and bob/master too, unless he deleted his name master).

Now that I have more than one remote, running git fetch --all has a meaning. Before, with just the one remote named origin, git fetch --all means fetch from all remotes, which means fetch from origin, which is what git fetch without --all means: there's just the one remote, so the remote is the one we fetch from.

With two remotes, though, git fetch with no additional qualifier means fetch from some remote. Which one? The git fetch documentation here is not a model of clarity, but the answer currently is:

if I am on branch B and B has a configured remote of R, that's the one git fetch uses;
otherwise, git fetch falls back on the name origin.

(This might change someday.)

If I give git fetch a name like origin or bob, that's the one remote it will fetch from, and there are more options such as "remote groups" and of course --all. Using --all directs git fetch to run git fetch on all remotes, one at a time.²

So: --all is only useful if you've defined two or more remotes. If you have set up remote access to Bob's repository, you can see Bob's branches. This of course requires that you have access to Bob's machine, or Bob's fork on GitHub, or whatever.

²Ideally Git should run multiple parallel fetches, but currently it doesn't.

Conclusion

In the end, the real key here is commits. We get commits by their hash IDs. We find those hash IDs through names—branch names, tag names, remote-tracking names, whatever names. The git fetch command reaches out to some other Git (software+repository). By default, it uses their branch names (and their tag names, depending on --tags and other fetch options) to find commits to get, gets those commits, and then creates or updates names in our repository, but with the standard setup, the names we get in our repository for their branch names are our remote-tracking names instead.

The only names we can see are those that they offer us, and they can only offer us the names that they have. So if "their Git" is a centralized repository somewhere, and Bob creates branches in Bob's clone and makes commits there but never sends the names or commits to the centralized repository, the centralized repository never has anything to give us in the first place.

Neatly explained! But if Bob pushes to the centralized repository and later deletes the branch names, there will be "dangling" commits on the remote; `git fetch` can't fetch them right? In this case I start working on the repo after Bob has already created, pushed and deleted a branch to/from the remote — Sebi, Oct 20 '21 at 14:55
Yes: if Bob creates branch `temp` on the central repo and puts some temporary commits there, then asks the central repo to delete `temp`, the commits themselves may linger, but without a reference, `git fetch` won't *see* them and therefore won't retrieve them. This is not *guaranteed:* in particular, some site might optimize `git clone` by just sending every commit even if there's no ref for it, allowing an Evil Attacker to see nominally-deleted commits. But you won't see them in normal practice ... *unless* you fetch from the central repo during the period when `temp` exists. — torek, Oct 20 '21 at 18:38
Since there is no way to close the race between Alice (fetching from central repo while `temp` exists) and Bob (creating, then later deleting, `temp`), we should in general assume that any commit pushed *to* the central repo has been grabbed by everyone who has access to the central repo. But Alice, in our hypothetical setup here, won't have to worry about those commits as long as she doesn't *look* for `temp`, and if she has `fetch.prune` set to `true` she'll drop `origin/temp` locally as soon as she fetches from central when `temp` doesn't exist. — torek, Oct 20 '21 at 18:39

Orace · Answer 2 · 2021-10-19T13:34:57.517

I suppose that users A and B are on different computers (A and B) and that the master branch is stored on a server.

First

Make a list of repository that have known branch B.

The one used by user B on computer B.
The one on the server. If user B had pushed branch B on the server.
Others ? (user B had pushed branch B on a backup repository).

Second

Make sure that the branch have been deleted from all those repository. If not, A can retrieve branch B from here (ex : the server).

Finally

Take a look at reflog, it provides the recent history of HEAD(local) and can help user B to retrieve branch B after delete. Some git servers also have some identical feature (like github as explained here).