The short answer is no. The long answer is maybe, but consider another way.
Shallow clones and shallow submodules
The long answer, which lets you get partway to what you want, starts with a technical note: you're not pulling, in Git terms. In Git, "pull" means "fetch, then merge-or-rebase" and you are not going to merge-or-rebase here. In fact, when you're "init"-ing you are generally going to make the initial clones.
Each submodule is actually its own repository.1 Git is, sooner or later, going to do a git checkout
within each of those repositories, asking it to check out, not a branch, but rather one specific commit, which is quite often not the latest commit. Given the nature of Git repositories and software development, and the idea that a submodule is, in the first place, a reference to a third-party repository, i.e., one you specifically do not and cannot control, the best you can do is say: "I know that my software works with one specific version of their software, and that version is <fill in the blank>." Thus, your repository lists the specific version you want from their repository.
Now we get to the heart of the problem. When you git clone
a repository, or use git fetch
to update an existing clone, you do so by asking for specific branch and/or tag names, rather than specific commit IDs. There is some (very limited) support for fetching specific IDs, but it must be enabled in that other repository, the one we just said that you do not and cannot control. Enabling fetch-by-ID is computationally expensive for them—whoever "they" are, the ones controlling the other repository—and not something you can do on your side, nor demand, nor is it enabled by default. This means that in general it's just not available.
In any case, git clone
only works with names: you may git clone -b branch url
, for instance, to make your new clone start by checking out that specific branch, or git clone -b tag url
to make your new clone start by checking out (as a detached HEAD) that specific tag. Despite this "check out a specific branch or tag", though, the clone defaults to cloning all the names offered by the remote, and making a full-depth (i.e., non-shallow) clone.
All of this does mean something important. First, shallow clones exist. A shallow clone is one made with a --depth
argument. It can be deepened by a git fetch
with another --depth
. The "depth" is the number of commits fetched "beyond" the commit(s) identified by the name(s) used during the clone or fetch, with some fairly complicated rules. (The details of these rules don't matter much here.)
Second, because shallow clones exist, shallow submodules also exist. A shallow submodule is simply a submodule that is cloned with --depth
. But there is a problem: there is no easy or obvious way to determine what depth is needed. You can pass a --depth
argument to git submodule add
or git submodule update
, but it's not obvious how deep you should go.
Here's the problem: your submodule will be cloned, perhaps by a branch or tag name, but then your submodule will be told to check out one particular commit (by its raw hash ID). Will that commit be in the clone? What depth guarantees that it will? If you are cloning by tag name, and the tag always names the correct commit, you can use --depth 1
(and hence you can use --shallow-submodules
during the initial git clone
as well), but that only works if, well, see above.
1What's special about these sub-repositories is that they are:
- listed in the outer repository (in a
.gitmodules
file);
- generally kept in "detached HEAD" mode;
- and detached at a commit whose ID is stored in the outer repository.
The modules file lists the names and URLs for the various submodules. "Initializing" a submodule amounts to copying stuff from .gitmodules
to the configuration file for the containing superproject, and "updating" a submodule usually amounts to cloning or fetching. The commit at which the submodule is to be detached is recorded in the superproject's repository as a "gitlink" entry in a tree object.
Submodule support has grown rather complex in modern versions of Git though, so now there are more things you can do when doing the update step.
Reference clones
There is a much better, more general solution for many cases. Instead of fussing with shallow clones, you can point Git at a reference clone. The reference clone is any clone of the repository you're trying to clone.2 Ideally, it's a recent and reasonably up-to-date clone of the repository you are cloning, but any clone will do.
What Git does with a reference clone is a bit complicated (see the documentation for details), but the short version is that when cloning some repository, instead of getting all the objects over the network from some distant server (which may be slow and/or rate-limited), your Git will ask the distant server what objects and such it needs, then look at your local3 reference clone to see if it already has those objects. If so, it will "borrow" them from the reference clone.
This lets you obtain a full, complete, up-to-date clone while using very little network and storage resources, since you will no longer need to bring (most or all of) the data over, nor (unless --detach
-ing) store it yourself. That in turn means you need not worry about your shallow clone being too shallow: you just get one slow full clone, then reference the heck out of it for all other clones, which go fast. Using reference clones can cut the time to clone a few big GitHub repositories, from an hour-plus, down to tens of seconds, for instance.
2Technically, the reference could be any repository at all. A repository not actually related to the one you are cloning is going to make a lousy reference, though: it will have none of the objects you need, and will provide no speedup at all. (It could even have the wrong data under the object's name, although the chances of this are vanishingly small. This cannot happen if the reference is correct since object names cannot be reused this way.)
3The reference should be "as local as possible" for speed, but does not really have to be on your machine, just accessible. If the reference will not always be present you will probably want to add --dissociate
, so that the objects get copied from the reference clone into the new clone. This uses more disk space, of course.