Trouble understanding git submodule summary

Question

I am facing some problems in understanding what exactly does git submodule summary do? How and when do we use this command? I am also not able to understand the significance of the --files option. To me, not using the tag seems the same as using it.

Also, how will it differ from git submodule status? I am extremely confused. I had been reading the Documentation of the command if it helps.

Thank you so much in advance for the help! :)

score 3 · Accepted Answer · answered Mar 19 '20 at 22:53

For git submodule status:

$ git submodule status
hash-1 path-1 (describe-output-1)
hash-2 path-2 (describe-output-2)
   :
hash-n path-n (describe-output-n)

This tells you, for each submodule path, the hash ID of the commit that is checked out within that submodule path, and the result of running git describe within that submodule. For instance, if you saw

 51ebf55b9309824346a6589c9f3b130c6f371b8f foo (v2.25.0-462-g51ebf55b93)

as the output, but then did a git checkout v2.15.0 in the foo directory:

(cd foo; git checkout v2.15.0)

and ran it again, you'd see:

+cb5918aa0d50f50e83787f65c2ddc3dcb10159fe foo (v2.15.0)

instead. (The + sign indicates that it's out of sync; see below.)

The hash ID is simply the result of running git rev-parse HEAD in each submodule. The describe output is simply the result of running git describe in each submodule. The path in the middle is the argument you'd need to supply to a cd (change-directory) command to switch from the superproject into the given submodule.

For git submodule summary, the details are a little more complicated. This basically runs a git log in each submodule, though.

Basics of submodules

Remember that a submodule is nothing more or less than another Git repository—the one that Git calls the submodule—plus a little bit of glue in this Git repository, which Git calls the superproject. The "glue" in the superproject consists of a very small number of items:

Information needed to git clone the submodule, stored in a file named .gitmodules. This only gets used when you first tell the superproject Git to do that clone, e.g., via git submodule update --init.
The path name of the submodule, as it appears in the superproject. The superproject Git will make an empty directory / folder (whichever term you prefer) to hold the work-tree for this submodule.¹
A commit hash ID. The superproject Git will take this commit hash ID, and in effect, run (cd path; git checkout hash) to put the submodule Git into detached HEAD mode, with that particular commit checked out.

These last two items are stored in every new commit you make in the superproject (and are already stored in existing commits).² In order to get stored, the path name and commit hash ID must be stored in Git's index, because Git makes all new commits from the index.

(If you're not clear on the distinction between Git's index and your work-tree, see What's the difference between HEAD, working tree and index, in Git? and What does git-rm mean by working tree and index?.)

¹In modern Git, the .git for the submodule that will appear within this path is an ordinary file whose contents will be a path under which the submodule Git can find the repository. The superproject Git will move the repository database out of the submodule. Git calls this absorbing the submodule. In older versions of Git, the submodule will have its own .git directory/folder that contains the submodule Git repository database.

²In fact, the first item—the .gitmodules file—should also be in all these commits, but since it's an ordinary file, there is nothing special about it: you just work with it like you do any ordinary file. Since the superproject only really needs it once, when cloning the submodule, if you accidentally or on purpose leave it out of new commits, you won't notice until someone else tries to use that commit as a starting point for a fresh clone of the superproject.

Since it's quite rare to change a .gitmodules file, and it carries across commits otherwise just like any other file, this is rarely a problem. It's mostly a problem only if you create the submodule using something other than git submodule add in the first place.

Reading gitlinks, vs messing with the submodule directly

The superproject entity that records both the path and hash ID, ready to go into your next commit, is called a gitlink. It exists only in Git's index, so it is very hard to see. (You can dump out the index contents using git ls-files --stage, but this is usually way too verbose.) But it's always there: it says to use this commit, check out, as a detached HEAD, this hash ID in this submodule.

Let's suppose that there's a submodule at the path sub (in the index, as :sub or :0:sub—the number here is the staging slot). When you make commits in the superproject, this gitlink goes into commits. You can read it out of the index:

git rev-parse :sub

or read it out of the current commit:

git rev-parse HEAD:sub

or read it out of any commit:

git rev-parse <hash>:sub

to get the stored gitlink hash ID for sub from the given commit hash-ID.

If you run git submodule update in your superproject, that Git will do the appropriate (cd sub; git checkout <hash>) based on whatever hash ID is in the index right now. That git checkout will, if the submodule repository is "clean", cleanly check out that particular commit.

But each submodule is a Git repository—a work-tree, an index, and an underlying repository-database. You can cd sub and git checkout whatever you want, or dirty up its (sub's) index and/or its work-tree. And, that submodule can have its own branch names—it's a Git repository, and every Git repository has branch names, right? Suppose you cd sub; git checkout master for instance. Now that submodule is on a branch, not in detached HEAD mode. You can make new commits, run git merge, and/or run all kinds of other commands. You can fetch new commits from some upstream repository. You can do anything you want: it's a Git repository, with all Git commands available.

Suppose, then, that you've done something—it doesn't really matter what—to some Git repository that's acting as a submodule for some superproject. Now you return to the superproject (cd ..), and in the superproject, you ask it: which commit did you recommend be checked out? That is, you read the gitlink entry in the superproject, from the superproject's index, or from a commit.

You have two hash IDs. They may be the same! Maybe master in the submodule is the hash ID stored in the superproject's gitlink. Or, maybe they're different. If you made a new commit just now in the submodule, they're definitely different, because every new commit hash ID is unique.

If the two are different, git submodule status will print +<hash>; the hash it prints is the one that's actually checked out in sub. If the two are the same, it prints the (single) hash ID without the +.

Meanwhile, if you run git submodule summary, your superproject Git:

grabs the recommended hash ID
grabs the actually checked out hash ID
uses git log in the submodule to find which commits are "between" these two hash IDs.

Specifically, it uses git log --oneline --left-right <hash1>...<hash2> (note the --oneline and the three dots here; it also forces a few more options but these are the key ones). The hash1 value is the recommended hash and the hash2 value is the actually checked out hash. The result of this listing is to show commits that are reachable from hash1 but not hash2 (prefixed with <) and commits that are reachable from hash2 but not hash1 (prefixed with >).

(For much more about reachability, see Think Like (a) Git.)

`git submodule summary`: `--files` vs `--cached`

I am also not able to understand the significance of the --files option

The --files option is the default. The --cached option changes where git submodule summary gets its two hash IDs. Instead of getting the first hash from the index (:sub), and then going into the submodule and reading out the HEAD value for the second, it reads the first ID from the current commit (HEAD:sub) and gets the second from the index (:sub). The remainder of its operation is the same: enter the submodule and run git log with appropriate options.

That was a GREAT explanation!! though I have one doubt, can we say that `--cached` is the exact opposite of `--files`? By opposite, I mean that the way they fetch the two hashIDs. — rasengan__, Mar 21 '20 at 05:44
Given that `--files` picks up two hash IDs (extracted from index in superproject, and value-of-HEAD in submodule) and `--cached` picks up two hash IDs (extracted from a commit in superproject and extracted from index in superproject), I would not call them *opposites*. One of the two hashes is from the same source but in the opposite position; the other of the two hashes is from a different source. That's neither just a transposition, nor a complete difference. — torek, Mar 21 '20 at 06:39

Trouble understanding git submodule summary

1 Answers1

Basics of submodules

Reading gitlinks, vs messing with the submodule directly

git submodule summary: --files vs --cached

`git submodule summary`: `--files` vs `--cached`