TL;DR
git status
compares the HEAD
commit (current commit) and the index. Whatever is different is staged for commit. Then it compares the index and the work-tree. Whatever is different is not staged for commit. In this case, Git is comparing the index gitlink to the actual commit hash ID in the submodule. When it says:
modified: submodule_name (new commits)
it just means the gitlink in the index does not match the HEAD
commit hash in the submodule. This does not mean there are new commits in the submodule, nor that there aren't new commits there; it just means that the existing index gitlink hash ID doesn't match the existing submodule checkout.
Long
There's always a lot of confusion around submodules. The question you linked to—Why is my GIT Submodule HEAD detached from master?—has, as its accepted answer, an answer that doesn't really get to the heart of the problem, which I think can be summarized by the following conversation:
PERSON: Hey, Git, I'd like to use this other Git repository as a subdirectory of my Git repository.
GIT: OK, done.
PERSON: Cool. [Later:] Oh hey, why, when I make a new clone, is my submodule in detached HEAD state?
GIT: That's how it works.
PERSON: But I want it to be on branch master
.
GIT: OK.
PERSON: It's still detached!
GIT: Yeah.
PERSON: I want it on a branch!
GIT: Yeah, no. Sorry.
In short, that's just how it works. Now, a submodule is a Git repository, so you can get the submodule onto a branch. But the superproject Git is going to yank it right back off the branch, because that's how submodules work. It takes one more bit of mind-set to understand why this is the case.
Why submodules are detached
First, let's note again that a submodule is a Git repository. The only thing that makes it a submodule is the fact that some other Git repository controls it now and then. We call the other Git repository the superproject. The superproject is doing control commands, and the submodule is obeying them. Other than this, they're two independent Git repositories: if the submodule has master
and develop
and whatever else branches, these are independent of the superproject's master
and develop
and whatever else, and the submodule and superproject don't have to have the same set of branch names at all.
Note, too, that a submodule can be a superproject to yet another submodule. This situation gets especially confusing since now saying "the" submodule or "the" superproject is ambiguous. There are now two superprojects, two submodules, and three Git repositories, and the middle Git repository is both submodule (of the top Git) and superproject (of the bottom one).
The design makes one huge assumption, which is a completely safe assumption, but can be annoying. This assumptions is: The submodule is cloned from a repository that you don't control. This other repository may be updated very frequently, or hardly ever, but their branches, which get copied to your clone as your submodule's origin/*
remote-tracking names, change in a way that you don't necessarily control. When you first clone that submodule repository, git clone
would create new branch named master
, or some other name if they say so, and the commit you'd get by checking out whichever name this is, is under their control, not yours.
For this reason, the superproject itself does not need—and hardly ever uses—any of the branch names of the submodule. Instead, the superproject records something about the submodule repository that cannot change. The fact that it can't change means that no matter what whoever controls your clone-source does to the submodule, they cannot wreck your dependency on that submodule.
(That's not 100% true of course. For instance, they can completely delete the source repository. Or they can discard a commit or tag that you depend on, but in general, people don't delete repositories. Discarding commits is rare-ish, and discarding a published tag and its corresponding commit is especially rare.)
The thing that cannot change, that your superproject records, is the raw hash ID of the commit that the superproject will tell the submodule to git checkout hash
. This recorded information goes into every commit in your superproejct! (Well, every commit that uses the submodule.)
A detached HEAD occurs when you run git checkout hash
If you've been using Git long enough, you're familiar with this. Check out any historic commit by its hash ID, and Git goes into detached HEAD mode.
The superproject checks out a historic commit by its unchangeable hash ID. So the submodule ends up in detached HEAD mode. That's both how and why it all works. The superproject records the hash ID, and commands the submodule: check that one out, and now you're in detached HEAD mode.
This works great for read-only historic commits. Every commit in your superproject records the correct submodule hash ID, for that superproject commit. When you check out that commit, you tell Git to synchronize your submodules, and you get the right submodule commit too, so that you can build and use your project.
This gets in the way of new work
First, we all need to agree on a definition: Existing commits are not about new work. Commits are read-only, frozen for all time. You cannot change anything about any existing commit. (That's why the submodule's commit's hash ID is useful to the superproject: it's frozen in time, and presumably good forevermore.) Each commit contains a snapshot of all of your files, plus some metadata such as who made it, when, and why—the log message.
Commits are frozen like this, but new work is, we presume, a desirable thing. So any Git repository—well, except a --bare
one—provides a place in which you can do that. That place is the work-tree (or working tree or any number of similar spellings). Git copies the compressed, read-only, Git-only-format files out of some commit, into the work-tree, where they take on their ordinary everyday form and you can therefore see and work on/with your files.
A commit that refers to a submodule does so through what Git calls a gitlink. The gitlink is, in effect, a committed file of mode 160000
(normal files are mode 100644
or mode 100755
), where the file's "contents" are just the hash ID that the superproject should command the submodule Git to git checkout
.
Hence, a submodule entry—a gitlink—in a commit tells git checkout
: You are acting as as a superproject. When you get to this place in the work-tree, instead of extracting just one file here, the submodule should be on this commit, as a detached HEAD. If you use git checkout --recurse-submodules
, Git does exactly that. If you use git checkout --no-recurse-submodules
, Git holds off on doing that—it leaves the submodule, which is after all a separate Git repository, alone.
Now, Git makes new commits not from what's in the work-tree, but rather what's in the index. The index contains a copy of every file ... and when a commit has a gitlink, the index contains a copy of that gitlink. It's the gitlink entry in the index that determines what goes into the next commit you make. So the next superproject commit uses whatever is in the superproject's index.
Sometimes you want the submodule to be on another commit. Since the submodule is a Git repository, you can just go into it and git checkout
whatever you like. If you use a branch name, the submodule's HEAD
will now be attached. As far as the superproject Git is concerned, this doesn't matter: what matters to the superproject is the actual hash ID. If the submodule's git rev-parse HEAD
still produces the same hash ID that's in the superproject's index's gitlink, everything still matches up. If it produces some other hash ID, it's up to you to resolve that. Since you want a different commit, you should now update the superproject's index's gitlink.
- PERSON: But, hey Git: I told you to remember a branch name in the superproject, for this specific submodule. How about you, Git, command the Git that's controlling the submodule to
git checkout
that branch name?
You can do that. But the assumption here is that you had your Git clone your submodule Git from an upstream repository that you don't control and that you don't git push
to. So this isn't sufficient and what Git provides instead is the command:
git submodule update --remote
In this case, the superproject Git will:
- enter the submodule
- command the submodule Git to run
git fetch
- wait for
origin/master
, or whatever it is, to get updated in the submodule, as a result of that git fetch
- find the new hash ID of the submodule's
origin/master
(or whatever) based on the upstream (that you don't control, but that you just fetched-from)
and then have the submodule Git git checkout
that hash ID ... as a detached HEAD, again!
If you do control the submodule, and want to make new commits in it, you need to cd
into the submodule and just git checkout
whatever branch name you want, and then do your work there. This submodule is, after all, a regular old Git repository. You can do whatever work you like and then run git add
to copy any updated work-tree files into the index—git commit
is going to use what's in the index—and run git commit
to make the new commit.
Then, having done all of this, you can either push the commit upstream right now, or wait and cd
back into the superproject. Either way you can now do whatever work is required in the superproject, and git add
any modified files and the name of the submodule. You're not just updating the files in the superproject index, you also need to update the gitlink in the superproject. And now that you've done all of that, you can run git commit
in the superproject, to make a new commit that stores the updated files and the updated gitlink.
Now that you have a new superproject commit, you can git push
the new superproject commit somewhere. If you already used git push
in the submodule, that's all you need to do. But if not, you should git push
the submodule first. The reason is pretty obvious, once you think about it: the new superproject commit says: When you get to this submodule, read this gitlink and extract commit a987654...
(or whatever hash ID it is). For someone else to do that, they'll need to git fetch
the upstream submodule that you have to already have git push
ed to, so as to get commit a987654...
into their submodule Git!
Note that this matches the git submodule update --remote
action: they will go to their submodule's upstream, git fetch
the updated branch, and then git checkout
the appropriate origin/branch
hash ID, in this case a987654...
, as the detached HEAD for their submodule.
This is not the smoothest process ever, but it's as simple as Git itself can make it. There are several more things that git submodule update
can do, but they all start with this frame of mind: the submodule repository is cloned from somewhere else, and it is primarily the somewhere else that supplies new commits for the submodule.