At the same time, I do not know where it gets this 8cc8573b39c9efde42a77701b91b3b6dcbb6b7ac
from.
That hash ID comes from a gitlink in the superproject.
Whenever you're using submodules, you are using at least two Git repositories. We call one the superproject and one the submodule. The superproject Git repository simply "lives above" the submodule Git repository, in terms of the checked-out working trees. For instance, if you run:
git clone ssh://github.com/org/super.git
to clone the superproject, you wind up with super/
in the current directory containing the cloned repository (super/.git
) and a working tree (everything that's not .git
within super/
).
The superproject repository checkout in the super/
directory will contain a file named .gitmodules
. Inside this file, the superproject stores the URL for the submodule. The Git running git checkout
in the superproject checks out some particular commit into the super/
directory. This fills in that Git's index while also filling in your working tree, in the super/
directory. So we now:
cd super/
to get into the superproject.
Let's suppose that the submodule ssh://github.com/org/sub.git
is supposed to be cloned into the path lib/sub
here in the working tree in super/
that we just cd
-ed into. The superproject Git will have made an empty directory, lib/sub
,1 in the working tree. There is as yet no clone of sub.git
in this repository,2 so you now have to run:
git submodule update --init
This reads both the .gitmodules
file and the index.
As I mentioned above, the initial checkout filled in both Git's index and your working tree. The index—or staging area, as it is increasingly being called these days—is a data structure that other version control systems keep hidden, but Git exposes to you, the programmer, and makes you understand it.3 If you don't understand it yet, you had best learn about it now: jump to the appended section describing the index, then return here. (I'd put in a link but StackOverflow won't let you insert HTML anchors in an answer.)
Anyway, in the index, for each submodule that this particular superproject "links to", there is an entry that Git calls a gitlink. The gitlink contains two parts:
- there is a file path, in this case
lib/sub
; and
- there is a big ugly hash ID: in your case,
8cc8573b39c9efde42a77701b91b3b6dcbb6b7ac
.
Running git submodule update --init
looks inside lib/sub
and sees that it is empty, so this now reads the .gitmodules
file to find what git clone
command to run, then runs a new git clone
:
git clone -n ssh://github.com/org/sub.git lib/sub
for instance. The -n
option here is short for --no-checkout
: the initial clone is run without any checkout. We'll see why in just a moment.
There is one extra wrinkle in modern Git, which stuffs the .git
repository into a special directory under the .git
repository in super/.git
, and creates instead a lib/sub/.git
file to let the submodule find the superproject's hiding places. In the near future there will also be some additional breadcrumbs left behind so that Git work in the submodule can "know" that the submodule is being used as a submodule. At the moment, though, once the submodule clone happens, the submodule is quite unaware of the fact that it is a submodule. As far as the submodule in lib/sub
is concerned, it's just a regular old Git repository. It just has not yet checked anything out.
Now that the clone exists, git submodule update
(with or without --init
: the --init
just does the clone step if needed) uses the value it read out of the gitlink to do something. Exactly what something is done at this point is complicated, but let's address the usual case:
The usual case is that the superproject Git runs the equivalent of:
(cd lib/sub && git checkout $hash)
where $hash
is the hash ID read from the gitlink. This puts the submodule into detached HEAD mode, on the given commit—provided that the submodule clone has that commit, that is.
Adding --remote
to the git submodule update
command makes git submodule update
run git fetch
in the submodule, and then read out one of the origin/name
branch names updated by this git fetch
. This hash value replaces the $hash
above. That is:
(cd lib/sub && git fetch && git rev-parse origin/master)
for instance. If this works, the output from the rev-parse
is goes into $hash
, replacing the hash ID the superproject Git got from its index.
If the $hash
can't be found, and we haven't just run git fetch
in the submodule, a modern Git will run:
(cd lib/sub && git fetch origin $hash)
This makes a by-hash-ID request to the server that serves the submodule Git repository, as recorded in the URL recorded under remote.origin.url
by the earlier git clone
step.
Not all servers allow fetching by hash ID. This was the case with your server:
error: Server does not allow request for unadvertised object 8cc85...
In this case, the git submodule update
fails.
It seems as though either you have recursive checkout turned on, or you have some script that is running git pull
for you. The output line:
Server refused to set environment variables
does not seem to be in the Git source code at all, which is odd.
1This observation—that superprojects cause empty directories to be created—is at the heart of the trick of using an empty submodule to store an empty directory. See this answer to How can I add a blank directory to a Git repository?
2If you set the recursive checkout option, the initial checkout runs git submodule update --init
for you here. I am describing the setup where you have to run it manually, since that's more instructive.
3You can sort of get away without learning about it for a while, especially if you use git commit -a
. But some things in Git are simply inexplicable unless you know about the index. Don't try to skimp here! You don't need to know every detail, just (a) that it exists and (b) the items in the section below.
Things you might be able to do
You could try cloning without recursion turned on, then using git submodule update --init
to get the clones to happen, then enter the failing submodule(s) and just run git fetch
. With some luck, a full fetch will bring in the target commit (in this case 8cc8573b39c9efde42a77701b91b3b6dcbb6b7ac
): run git cat-file -t 8cc8573b39c9efde42a77701b91b3b6dcbb6b7ac
to see if it is now available as an object of type commit
and if so, a git checkout -r
or git submodule update --recursive
in the superproject working tree should now proceed (or at least get further).
You could switch to a server that supports the "any SHA1 in want" request. GitHub's servers do this, for instance.
You could, if you have permission, find the server and reconfigure it to allow "any SHA1 in want":
git config uploadpack.allowAnySHA1InWant true
None of these will work if 8cc8573b39c9efde42a77701b91b3b6dcbb6b7ac
refers to a commit that does not now exist in the submodule on the server. This can happen when:
- someone creates that commit, but fails to push it to the server; or
- someone creates that commit, pushes it to the server, but then later does a force-push to the server that removes access to that commit and it eventually gets garbage-collected.
In these two cases, commit 8cc8573b39c9efde42a77701b91b3b6dcbb6b7ac
is not available, and your only choices are:
- find someone who does have it, or
- choose some other commit in the submodule, such that the submodule checkout produces working software.
Whether simply using the latest version of the submodule software will do this is unpredictable. Perhaps the superproject depends on a bug in the submodule, and that bug is now fixed.
In any case, if you do find a submodule commit that allows you to proceed, you should consider making a new superproject commit that refers to that submodule hash ID. For instance, suppose that checking out the latest origin/master
or origin/main
commit in the submodule works:
(cd lib/sub && git checkout origin/master)
# build and test software -- it works!
Then you can, at this point, make a new superproject commit that refers to whatever hash ID is checked out in the submodule now, which obviously is available, by doing this:
git add lib/sub
git commit -m "update lib/sub to current version"
Consider your commit message carefully: you want it to convey why you made this commit. "Update lib/sub" gets partway there but is definitely not complete. It may be complete enough, depending on how interested, informed, and intelligent the users of your code base are.
What to know about Git's index AKA staging area
Git's index or staging area is how Git:
- keeps track of what you checked out from the current commit;
- knows what you intend to put into the next commit; and
- keeps track of any merge conflicts, if those occur.
Your initial checkout takes all the files out of some commit—every commit has every file—and copies them to Git's index and your working tree. The "copies" in Git's index are in Git's internal object format.
Keeping track of the current commit
Files inside commits are stored in a special, read-only, Git-only, compressed and de-duplicated form that Git calls a blob object. This takes care of the obvious objection to storing every file in every commit: in a big repository with thousands of files, we typically change just one file, or a handful of files, and then commit. If every commit contains every file, won't the Git repository balloon into some sort of multi-terabyte monstrosity that nobody can even store?
The answer is: no, it won't, because the file are de-duplicated. If the new commit you just made changes one file out of 5000 files, Git only has to add one file to its internal objects database. Then Git adds one new commit to its database too, and this one new commit refers to 4999 existing blobs, plus one new blob. The 4999 re-used files did not take any space at all!
So, even though every commit refers to ("contains") every file, it doesn't take a lot of space. The "copies" that Git sticks into its index are in this same format: they're indirect references to internal Git blob objects. If you "copy" 5000 files from a commit to Git's index, you copy no data to Git's index, because they're all duplicated. The index still needs a bit of space—on average, a bit under 100 bytes per file in a moderate-size repository that I checked—to record various data about the file, but that's all. The file's content isn't in the index, just the name and other stuff we call metadata: information about the file.
Since these copies are only readable by Git itself, and writable by nobody—not even Git—they're not actually useful yet though. That's why Git copies them to your working tree as well: the working tree copies of each file are actual, normal, everyday files, rather than weird Git-ified internal objects. Every program on your computer can deal with the working-tree files.
So, that's the explanation for the first bullet point: the index holds all the files, in this internal Git-ified read-only "blob object" form, and keeps track of all the files that Git put into the working tree. This is also the source of a key term regarding working tree files: A tracked file, in the working tree, is one that is in Git's index. That's it—that's all there is to "tracked" here—but it turns out to be significant.
Keeping track of the next commit
The middle bullet point above is the one where most Git users most often interact with Git. To make a new commit, you start—you have to start4—by checking out some existing commit. That fills in Git's index and your working tree.
The working tree copies of files are there for you to edit and update. You can also create all-new files, or remove existing files. As you update these files, or create new ones, or remove existing ones, you must tell Git about each one.5 When you do this, Git updates its index.
The git add
command is the main command for doing this. Running git add
on some file tells Git: Read the working tree copy of that file, compress it, and check for duplicate data. If there's a duplicate, update the index with the duplicate. If not, use the new compressed data to make a new blob, ready for committing, and update the index with the new blob. Either way, the file is now ready to be committed.
Every other file that's in the index remains there, untouched, also ready to be committed. If you remove a file entirely, you tell Git to remove its index copy (the name and metadata), and now the lack-of-file is ready to be committed.
Everything you do, it turns out, is in service of updating the index. Git doesn't build the new commit from what's in your working tree. Git builds the new commit from what's in Git's index. The index is your proposed next commit. Every time you add something to it—an updated file or a new file—or remove a file from being tracked by removing it from the index, you've updated your proposed next commit.
The next actual commit you make—whenever you make it—will "freeze" the index copies of the files into a new commit. These copies will then be available forever, or at least, as long as that new commit continues to exist.6 The new commit then becomes the current commit; the index and the commit now match, and we're in a similar situation to when we first checked out some commit.7
4Except, that is, for the very first commit in a new, totally empty repository, or for the special case of git checkout --orphan
or git switch --orphan
. But we won't address the special cases here.
5Why doesn't Git find this out on its own? Well, it turns out that in new, ongoing-experimental code, Git does, but this is very hard to do—at least reliably—on many computers today. There are simplicity advantages, and other advantages, to not doing that. So Git didn't, initially, and now some people even depend on it.
In the old days, a lot of version control systems did find out on their own, at the time you ran their "commit" verb, whatever they called it. This could take a long time, so you'd run that command and then go take a coffee break or whatever. Git's near-instant commits were, when they first came out, astonishing.
6You can use git reset
to hide away some commit or commits, so that it/they cannot be found. They do not immediately cease to exist, but eventually, if they are hidden long enough, Git decides you don't want those commits after all, and cleans them out.
7Since the working tree plays no part in the git commit
process, we're allows to only git add
some files, then run git commit
, and then git add
more files and git commit
again. Each new commit picks up the changes in the added files, while leaving the un-added files un-added. This is one of the things people depend on, as noted in footnote 5. Since Git exposes the index like this, and documents how this works, you're allowed to depend on it.
Keeping track of merge conflicts
The last special case for the index comes into play only when using Git commands that invoke Git's merge engine. The merge engine combines three existing commits to produce one new commit. To do this, it reads up to three copies of each file into an expanded version of the index.
The merge engine then takes the three (or fewer), "numbered slot" entries, 1 through 3, for each file and combines them. If the merge engine is able to combine them correctly—or what it thinks is correct, anyway—it immediately shrinks that index entry down to a single normal, "slot-zero", unconflicted entry for the file:
If the merge process can do this for every slot for every file, the merge is complete and Git can go on.
If not, the merge stops in the middle, leaving the mess in the index.
Your job, as a programmer using Git to get work done, is then to resolve the mess in the index. Git leaves behind its best-effort at merging in your working tree as well, and you can use both the index information and the working tree copy to do your job.
This particular part just gets more complicated from here (e.g., whether you want to use git mergetool
), and this answer is long enough already, so we'll leave most of the details out. But I will say that this produces a case that I consider a sort of flaw in Git. When the index is in this "merge conflict" state, you cannot write it out. You cannot make a new commit. In general, you can't proceed from here until you resolve the merge conflict. You have just the two options: finish the merge, or abort it entirely. Since Git is meant for doing distributed work, it really ought to allow distributed merging as well, with the ability to make special "conflict commits" or whatever they might be called. But you can't.