For git submodule status
:
$ git submodule status
hash-1 path-1 (describe-output-1)
hash-2 path-2 (describe-output-2)
:
hash-n path-n (describe-output-n)
This tells you, for each submodule path, the hash ID of the commit that is checked out within that submodule path, and the result of running git describe
within that submodule. For instance, if you saw
51ebf55b9309824346a6589c9f3b130c6f371b8f foo (v2.25.0-462-g51ebf55b93)
as the output, but then did a git checkout v2.15.0
in the foo
directory:
(cd foo; git checkout v2.15.0)
and ran it again, you'd see:
+cb5918aa0d50f50e83787f65c2ddc3dcb10159fe foo (v2.15.0)
instead. (The +
sign indicates that it's out of sync; see below.)
The hash ID is simply the result of running git rev-parse HEAD
in each submodule. The describe output is simply the result of running git describe
in each submodule. The path in the middle is the argument you'd need to supply to a cd
(change-directory) command to switch from the superproject into the given submodule.
For git submodule summary
, the details are a little more complicated. This basically runs a git log
in each submodule, though.
Basics of submodules
Remember that a submodule is nothing more or less than another Git repository—the one that Git calls the submodule—plus a little bit of glue in this Git repository, which Git calls the superproject. The "glue" in the superproject consists of a very small number of items:
Information needed to git clone
the submodule, stored in a file named .gitmodules
. This only gets used when you first tell the superproject Git to do that clone, e.g., via git submodule update --init
.
The path name of the submodule, as it appears in the superproject. The superproject Git will make an empty directory / folder (whichever term you prefer) to hold the work-tree for this submodule.1
A commit hash ID. The superproject Git will take this commit hash ID, and in effect, run (cd path; git checkout hash)
to put the submodule Git into detached HEAD mode, with that particular commit checked out.
These last two items are stored in every new commit you make in the superproject (and are already stored in existing commits).2 In order to get stored, the path name and commit hash ID must be stored in Git's index, because Git makes all new commits from the index.
(If you're not clear on the distinction between Git's index and your work-tree, see What's the difference between HEAD, working tree and index, in Git? and What does git-rm mean by working tree and index?.)
1In modern Git, the .git
for the submodule that will appear within this path is an ordinary file whose contents will be a path under which the submodule Git can find the repository. The superproject Git will move the repository database out of the submodule. Git calls this absorbing the submodule. In older versions of Git, the submodule will have its own .git
directory/folder that contains the submodule Git repository database.
2In fact, the first item—the .gitmodules
file—should also be in all these commits, but since it's an ordinary file, there is nothing special about it: you just work with it like you do any ordinary file. Since the superproject only really needs it once, when cloning the submodule, if you accidentally or on purpose leave it out of new commits, you won't notice until someone else tries to use that commit as a starting point for a fresh clone of the superproject.
Since it's quite rare to change a .gitmodules
file, and it carries across commits otherwise just like any other file, this is rarely a problem. It's mostly a problem only if you create the submodule using something other than git submodule add
in the first place.
Reading gitlinks, vs messing with the submodule directly
The superproject entity that records both the path and hash ID, ready to go into your next commit, is called a gitlink. It exists only in Git's index, so it is very hard to see. (You can dump out the index contents using git ls-files --stage
, but this is usually way too verbose.) But it's always there: it says to use this commit, check out, as a detached HEAD, this hash ID in this submodule.
Let's suppose that there's a submodule at the path sub
(in the index, as :sub
or :0:sub
—the number here is the staging slot). When you make commits in the superproject, this gitlink goes into commits. You can read it out of the index:
git rev-parse :sub
or read it out of the current commit:
git rev-parse HEAD:sub
or read it out of any commit:
git rev-parse <hash>:sub
to get the stored gitlink hash ID for sub
from the given commit hash-ID.
If you run git submodule update
in your superproject, that Git will do the appropriate (cd sub; git checkout <hash>)
based on whatever hash ID is in the index right now. That git checkout
will, if the submodule repository is "clean", cleanly check out that particular commit.
But each submodule is a Git repository—a work-tree, an index, and an underlying repository-database. You can cd sub
and git checkout
whatever you want, or dirty up its (sub
's) index and/or its work-tree. And, that submodule can have its own branch names—it's a Git repository, and every Git repository has branch names, right? Suppose you cd sub; git checkout master
for instance. Now that submodule is on a branch, not in detached HEAD mode. You can make new commits, run git merge
, and/or run all kinds of other commands. You can fetch new commits from some upstream repository. You can do anything you want: it's a Git repository, with all Git commands available.
Suppose, then, that you've done something—it doesn't really matter what—to some Git repository that's acting as a submodule for some superproject. Now you return to the superproject (cd ..
), and in the superproject, you ask it: which commit did you recommend be checked out? That is, you read the gitlink entry in the superproject, from the superproject's index, or from a commit.
You have two hash IDs. They may be the same! Maybe master
in the submodule is the hash ID stored in the superproject's gitlink. Or, maybe they're different. If you made a new commit just now in the submodule, they're definitely different, because every new commit hash ID is unique.
If the two are different, git submodule status
will print +<hash>
; the hash it prints is the one that's actually checked out in sub
. If the two are the same, it prints the (single) hash ID without the +
.
Meanwhile, if you run git submodule summary
, your superproject Git:
- grabs the recommended hash ID
- grabs the actually checked out hash ID
- uses
git log
in the submodule to find which commits are "between" these two hash IDs.
Specifically, it uses git log --oneline --left-right <hash1>...<hash2>
(note the --oneline
and the three dots here; it also forces a few more options but these are the key ones). The hash1
value is the recommended hash and the hash2
value is the actually checked out hash. The result of this listing is to show commits that are reachable from hash1
but not hash2
(prefixed with <
) and commits that are reachable from hash2
but not hash1
(prefixed with >
).
(For much more about reachability, see Think Like (a) Git.)
git submodule summary
: --files
vs --cached
I am also not able to understand the significance of the --files
option
The --files
option is the default. The --cached
option changes where git submodule summary
gets its two hash IDs. Instead of getting the first hash from the index (:sub
), and then going into the submodule and reading out the HEAD
value for the second, it reads the first ID from the current commit (HEAD:sub
) and gets the second from the index (:sub
). The remainder of its operation is the same: enter the submodule and run git log
with appropriate options.