The root of the problem here is that git clone
's --depth
option turns on --single-branch
as well. To defeat that at clone time, use --no-single-branch
. To defeat it afterwards, see the accepted answer to How do I "undo" a --single-branch clone?
Note that after de-single-branch-ing the clone you have, you will have to run git fetch --depth 1
again. This will retrieve the rest of the branch names from the repository you cloned—all of them become remote-tracking names; see the details below—and allow you to run git checkout
on each such name to create a local branch with the same name. You can also use git remote set-branches --add
to add individual names to an existing remote; again, you'll need another git fetch --depth
.
Optional Reading: Details, or, why the above works
A Git repository—technically, a non-bare repository—really consists of the following three parts:
- a pair of databases, as described below;
- an index, by which Git knows what files to commit, i.e., which files to track, although there is much more to the index than just a list of files; and
- a working tree or work-tree in which you are able to use and modify your files. These files are literally yours, and are not actually in Git at all. The files inside Git, in the main database, are all read-only and in a special compressed and de-duplicated form that only Git itself can use.
When you run git clone
, you have your Git copy the main database—the one holding all the commits and files and such—more or less wholesale, but have it read the other database, parse through it and understand it and write, to your clone, a different database.
The --depth
flag affects the main database, so that you don't copy it wholesale. The --single-branch
flag—which, as we noted, --depth
turns on automatically—affects the secondary database. Before we go on, let's give the two databases names, so that we don't keep referring to some awkward phrase like "the party of the first part":
The thing I've been calling the "main database" is Git's object store. This is a simple key-value database in which the keys are hash IDs, and the values are Git's commits and other internal objects.1 Usually this is the largest part of a Git repository.2
The second database is also a simple key-value store, with the keys being names—branch and tag names included, but also almost all of Git's other names3—and the values being hash IDs. Each name stores just one hash ID, as that's all that is required.
So, to recap, git clone
will—without --single-branch
and --depth
flags anyway—call up some other Git and have it list out all of its branch and tag and other names. It will then use these names to find all the commits and other Git objects in the original repository, and have the other Git send over all of those objects. The result is a full copy of the object database.4 You now have all of the commits from the other Git repository.
At the same time, though, your own Git takes all of their names and picks-and-chooses which names to take, and what to do with them. In general, your Git takes all of their branch names—whose full spellings are things like refs/heads/master
, refs/heads/topic
, and so on—and renames them to become your own remote-tracking names instead: refs/remotes/origin/master
, refs/remotes/origin/topic
, and so on. Your Git then creates its own independent name-to-hash-ID database, with no branch names in it.5
The end result is that immediately after this step of git clone
, you have all the commits and none of the branches! This situation is quickly rectified by the last step of git clone
, though. Provided you didn't say --no-checkout
, the last step of git clone
is to run git checkout
, and this step actually creates one branch. The branch name your Git creates is the one you supplied with the -b
option. If you did not supply a -b
option, your Git asks the other Git which branch it recommends, and if all else fails, your Git assumes your own default initial branch name.6
1Each commit object refers to a (single) tree object, which holds the snapshot for that commit, and has metadata. Each tree object holds an array of partial file names—name components, that will be strung together as needed—and another hash ID. That hash ID identifies either another tree, or a blob object that stores some file's content. Git builds up the files' full names by reading all the sub-trees as needed, and stores the full file names in its index, and then extracts the files using the names and blob hash IDs as seen in the index. This isn't a complete description, but is why Git can't store empty directories: there's no way to put one into Git's index.
The object database can also contain annotated tag objects, each of which holds a hash ID, usually that of a commit. These are how Git provides its annotated tags.
2There are exceptions: old repositories that for some reason keep accumulating new names, e.g., new branch and tag names, but hardly ever get any new commits. But in general the object database is where most of the space is used, and most of the time for an initial clone.
3The other names include things like notes, in-progress bisection, names needed during some interactive rebases, and so on. Basically any name that will store a single hash ID goes into this database. Names that don't do that, such as the names of remotes like origin
, don't go in here. Those generally go in the config
file in the .git
directory.
This database is currently implemented rather poorly. Sometimes the names are stored as directory-and-file-names in the file system, which means that on case-insensitive file systems such as the default ones on Windows and macOS systems, branch names become case-insensitive. Sometimes the names are stored in a plain-text file named packed-refs
, which makes them all case-sensitive as Git always intended. A few special names, such as HEAD
, never go into the packed-refs file at all and are instead always stored as individual files within the .git
directory. There is work going on right now to provide a proper database, to solve a bunch of issues here.
4Technically, the result can and usually will omit any objects that cannot be found by using the names. We'll ignore this fine distinction here, though.
5Your Git will normally omit all of their non-branch non-tag names too. How it handles their tag names is complicated, but in a normal (not single-branch, not depth-limited) clone you normally wind up copying all their tag names.
6This used to be just hard-coded as master
, but it is now becoming configurable.
How --single-branch
affects this
With the --single-branch
option, your Git doesn't use all of their names. Instead, your Git uses only the one branch name from your -b
option, with the same default: if you don't supply -b
, your Git asks their Git what they recommend, or falls back on yet another default. Your Git then transforms that one branch name into one remote-tracking name. It makes sure to ask their Git only for commits that are on that branch, in that other Git repository.
The end result is that you get one remote-tracking name, and some subset of all of their commits. The final git checkout
step then creates one local branch name: the same name your Git used when selecting the subset of commits to obtain.
How --depth
affects this
Aside from automatically turning on --single-branch
—but note that you can turn this off with --no-single-branch
—what --depth
does is to create a shallow clone. To understand shallow clones completely, we have to get into graph theory. (We won't go very far with this here, though.)
In Git, each branch name identifies exactly one commit. But a branch in Git—if we ignore the question of What exactly do we mean by "branch"? (we shouldn't ignore it, but we will here)—usually has a bunch of commits. How does this work?
The answer is that each commit in Git contains the hash ID of some earlier commit. In the usual simple case, we end up with a long string of commits, each of which points backwards to one earlier commit. The last commit in this chain is the tip of the branch, or tip commit.
Let's draw a simple chain where we use one uppercase letter to stand in for the real hash ID of each commit. Hash H
will be the last one in the chain, and we'll say that this is branch br1
:
... <-F <-G <-H <-- br1
The name br1
holds the hash ID of the last commit H
. That's how we can have Git fish it out of the object database (which, remember, is a simple key-value store: the hash ID is the key). But inside the body of commit H
, Git has stored the hash ID of earlier commit G
. So from H
we can get G
's ID, and have Git look up commit G
in the key-value store. Meanwhile commit G
has F
's ID, so we can walk backwards from G
to F
.
This is how Git works: backwards. A name, like a branch or tag or remote-tracking name, stores one hash ID. That's the commit we want, and then, if we want all the commits, Git walks backwards from that commit to the previous commit, and then keeps walking. The name lets us get started; the commits themselves provide the rest of the path.
The path we traverse, and all the commits we collect as we walk this path, are the reachable commits on that branch.7 When two branches diverge, they have some sequence that's common to both:
I--J <-- br1
/
...--F--G--H <-- shared
\
K--L <-- br2
Here, commits up through H
are on all three branches, and the last two commits on each of the br*
branches are unique to their branch.
This reachability idea is at the heart of Git. It's also how --depth
works. If we say --depth 1
, we are telling our Git: When you obtain commits from the other Git, only go one step. If we use --depth 1
here, we get:
i--J <-- br1
g--H <-- shared
j--L <-- br2
If we use --depth 2
, we tell our Git: When you obtain commits from the other Git, go two steps. This time we get:
I--J <-- br1
/
f--G--H <-- shared
\
K--L <-- br2
Note that if br2
had more commits unique to it, we wouldn't have the connection from br2
back to shared
.
The lowercase commit letters here denote the fact that Git knows there's a parent, but that these parents are marked as "missing on purpose". More precisely, the hash IDs of the shallow graft commits are saved in a file called shallow
in the .git
directory. Git knows not to try to load up these commits from the object repository, and that it's not a bug that they're missing. Normally, that would be a bug.
Since they're missing-on-purpose, git log
can't and won't show these commits, and it will be as if the shallow-grafted commits have no parents at all. That's misleading in a way, but also what you should expect. In most cases, it's harmless enough.
7This assumes the name we used was a branch name. If we used a tag name, these are the commits reachable from the tag; if we used a remote-tracking name, these are the commits reachable from the remote-tracking name. Since all names use the same system, each name provides some way to reach some set of commits.
It's the git fetch
operation that gets commits
When we use git clone
, we're really running the equivalent of a six-command sequence, five of which are Git commands:
mkdir
, to create a new empty directory / folder;
git init
, to create a new empty repository in the directory made in step 1;
git remote add
, to add the name origin
, or some other name of our choice, and a URL and a fetch
configuration–that's the one we change to defeat the single-branch-ness;
git config
, if needed, to add configuration options specified at the git clone
command;
git fetch
, to obtain commits and make remote-tracking names for the branch or branches chosen in step 3; and
git checkout
, to create one local branch name and fill in Git's index and our working tree.
The --depth
option is passed to the git fetch
at step 5. So if we have to adjust our origin
remote configuration, to de-single-branch the clone because step 3 added the remote with one particular branch only (see the git remote
documentation), we have to run a new git fetch
. This new git fetch
needs the same --depth
option.
Conclusion
The --depth
option to git clone
turns on both --single-branch
, which limits the set of names—and thus commits—obtained from the other Git repository, and passes the --depth
to the fetch step, which limits the depth of commit-graph obtained from the other Git repository. Using --no-single-branch
at clone time inhibits the name-restricting while keeping the depth-restricting. If you need to undo the name-restricting, or if you use git remote
to update the set of restricted branch names, you must run git fetch
again. If you want that git fetch
to have a depth restriction, you must pass --depth
again.
Note that git fetch
does respect existing shallow graft points, so in some cases, omitting the --depth
is somewhat harmless. For instance, if you have a single-branch clone of a repository that looks like this:
...--V--W--X <-- main
\
Y--Z <-- topic
and your single-branch clone is depth 1 on main
, so that commit W
is marked as a shallow graft point:
w--X <-- main
then adding topic
without a --depth
gets you:
w--X <-- main
\
Y--Z <-- topic
That is, main
didn't get any deeper this time. But if the graph were:
...--V--W--X <-- main
\
Y--Z <-- topic
and you added topic
and fetched without a new --depth
, you would get:
...--V w--X <-- main
\
Y--Z <-- topic
in your clone, which means you'd have to get commit V
and everything earlier. Note that commit W
remains marked-and-missing: since it's missing, your Git can't see that w
would connect back to V
and your own Git will show you this as:
X <-- main
..--V--Y--Z <-- topic
—which isn't wrong, technically, it's just misleading.