How to correctly configure git submodule

Question

I am trying to add a repository as a submodule in another git module. After adding the submodule I tried to clone the parent project:

git clone https://...
cd <parent_path>/<submodule_path>
git submodule init
git submodule update

Now if I git status in the submodule the HEAD is detached:

cd <submodule_path>
git status
HEAD detached at a4709b3
nothing to commit, working tree clean

After reading this answer I tried to checkout to submodule's master:

git checkout master
git status

On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

But now, if I git status on parent directory it shows that there are new commits in the submodule (which is definitely not the case)

cd <parent_dir_path>
git status

On branch test_submodule
Your branch is up to date with 'origin/test_submodule'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   submodule_name (new commits)

no changes added to commit (use "git add" and/or "git commit -a")

Can someone shed some light on this?

EDIT: Here's the output of cd <submodule_path> && git show master:

commit 42309e0e2f48aba11902633173053e2423d4ba62 (HEAD -> master, origin/master, origin/HEAD)
Merge: a4709b3 1c1ae85
Author: Abc
Date:   Fri May 31 16:27:52 2019 +0100

    Merge pull request #6 in test/testing_submodule from test_integration_with_repos to master

    * commit '1c1ae8515716d9c2d5135e86dc9c024c81e4320b':
      test

In the top part of your question, you presumably mean `cd `, i.e., cd into the superproject clone you just made. — torek, May 31 '19 at 16:09
Submodule's `master` points to a different commit, not to `a4709b3`. To check: `cd && git show master` — phd, May 31 '19 at 16:21
@phd I've edited my answer to include the output of `git show master`. — Tokyo, May 31 '19 at 16:39
When you did `git add` of the submodule, you added `a4709b3`. So when you did `git submodule update` to (fetch if necessary and) check out the right commit, it checked out `a4709b3`. That's the point of submodules: the added commit id's the content at that path, the same as for files (where the id resolves to bytes in a single blob, not a whole commit). If you check out something else there, then what's at your submodule path doesn't match what was committed there. — jthill, May 31 '19 at 16:45
@jthill But the submodule seems to be up to date. How can I get rid of the `(new commits)` message in parent directory? — Tokyo, May 31 '19 at 16:47
@jthill Or is this not a problem at all? Should I just `git add submoduleName` and `git commit` and `git push` from the parent module? — Tokyo, May 31 '19 at 16:58
You're confusing the submodule commit (`a4709b3`) you added to your superproject commit with other commits. If some particular repo is "up to date" with another, that just means all its tracking branches match the other repo's current ones; that's got nothing to do with which commit's currently checked out or which commit a superproject has registered to be checked out at a path. — jthill, May 31 '19 at 18:56

score 3 · Answer 1 · answered May 31 '19 at 22:34

TL;DR

git status compares the HEAD commit (current commit) and the index. Whatever is different is staged for commit. Then it compares the index and the work-tree. Whatever is different is not staged for commit. In this case, Git is comparing the index gitlink to the actual commit hash ID in the submodule. When it says:

modified:   submodule_name (new commits)

it just means the gitlink in the index does not match the HEAD commit hash in the submodule. This does not mean there are new commits in the submodule, nor that there aren't new commits there; it just means that the existing index gitlink hash ID doesn't match the existing submodule checkout.

Long

There's always a lot of confusion around submodules. The question you linked to—Why is my GIT Submodule HEAD detached from master?—has, as its accepted answer, an answer that doesn't really get to the heart of the problem, which I think can be summarized by the following conversation:

PERSON: Hey, Git, I'd like to use this other Git repository as a subdirectory of my Git repository.
GIT: OK, done.
PERSON: Cool. [Later:] Oh hey, why, when I make a new clone, is my submodule in detached HEAD state?
GIT: That's how it works.
PERSON: But I want it to be on branch master.
GIT: OK.
PERSON: It's still detached!
GIT: Yeah.
PERSON: I want it on a branch!
GIT: Yeah, no. Sorry.

In short, that's just how it works. Now, a submodule is a Git repository, so you can get the submodule onto a branch. But the superproject Git is going to yank it right back off the branch, because that's how submodules work. It takes one more bit of mind-set to understand why this is the case.

Why submodules are detached

First, let's note again that a submodule is a Git repository. The only thing that makes it a submodule is the fact that some other Git repository controls it now and then. We call the other Git repository the superproject. The superproject is doing control commands, and the submodule is obeying them. Other than this, they're two independent Git repositories: if the submodule has master and develop and whatever else branches, these are independent of the superproject's master and develop and whatever else, and the submodule and superproject don't have to have the same set of branch names at all.

Note, too, that a submodule can be a superproject to yet another submodule. This situation gets especially confusing since now saying "the" submodule or "the" superproject is ambiguous. There are now two superprojects, two submodules, and three Git repositories, and the middle Git repository is both submodule (of the top Git) and superproject (of the bottom one).

The design makes one huge assumption, which is a completely safe assumption, but can be annoying. This assumptions is: The submodule is cloned from a repository that you don't control. This other repository may be updated very frequently, or hardly ever, but their branches, which get copied to your clone as your submodule's origin/* remote-tracking names, change in a way that you don't necessarily control. When you first clone that submodule repository, git clone would create new branch named master, or some other name if they say so, and the commit you'd get by checking out whichever name this is, is under their control, not yours.

For this reason, the superproject itself does not need—and hardly ever uses—any of the branch names of the submodule. Instead, the superproject records something about the submodule repository that cannot change. The fact that it can't change means that no matter what whoever controls your clone-source does to the submodule, they cannot wreck your dependency on that submodule.

(That's not 100% true of course. For instance, they can completely delete the source repository. Or they can discard a commit or tag that you depend on, but in general, people don't delete repositories. Discarding commits is rare-ish, and discarding a published tag and its corresponding commit is especially rare.)

The thing that cannot change, that your superproject records, is the raw hash ID of the commit that the superproject will tell the submodule to git checkout hash. This recorded information goes into every commit in your superproejct! (Well, every commit that uses the submodule.)

A detached HEAD occurs when you run `git checkout hash`

If you've been using Git long enough, you're familiar with this. Check out any historic commit by its hash ID, and Git goes into detached HEAD mode.

The superproject checks out a historic commit by its unchangeable hash ID. So the submodule ends up in detached HEAD mode. That's both how and why it all works. The superproject records the hash ID, and commands the submodule: check that one out, and now you're in detached HEAD mode.

This works great for read-only historic commits. Every commit in your superproject records the correct submodule hash ID, for that superproject commit. When you check out that commit, you tell Git to synchronize your submodules, and you get the right submodule commit too, so that you can build and use your project.

This gets in the way of new work

First, we all need to agree on a definition: Existing commits are not about new work. Commits are read-only, frozen for all time. You cannot change anything about any existing commit. (That's why the submodule's commit's hash ID is useful to the superproject: it's frozen in time, and presumably good forevermore.) Each commit contains a snapshot of all of your files, plus some metadata such as who made it, when, and why—the log message.

Commits are frozen like this, but new work is, we presume, a desirable thing. So any Git repository—well, except a --bare one—provides a place in which you can do that. That place is the work-tree (or working tree or any number of similar spellings). Git copies the compressed, read-only, Git-only-format files out of some commit, into the work-tree, where they take on their ordinary everyday form and you can therefore see and work on/with your files.

A commit that refers to a submodule does so through what Git calls a gitlink. The gitlink is, in effect, a committed file of mode 160000 (normal files are mode 100644 or mode 100755), where the file's "contents" are just the hash ID that the superproject should command the submodule Git to git checkout.

Hence, a submodule entry—a gitlink—in a commit tells git checkout: You are acting as as a superproject. When you get to this place in the work-tree, instead of extracting just one file here, the submodule should be on this commit, as a detached HEAD. If you use git checkout --recurse-submodules, Git does exactly that. If you use git checkout --no-recurse-submodules, Git holds off on doing that—it leaves the submodule, which is after all a separate Git repository, alone.

Now, Git makes new commits not from what's in the work-tree, but rather what's in the index. The index contains a copy of every file ... and when a commit has a gitlink, the index contains a copy of that gitlink. It's the gitlink entry in the index that determines what goes into the next commit you make. So the next superproject commit uses whatever is in the superproject's index.

Sometimes you want the submodule to be on another commit. Since the submodule is a Git repository, you can just go into it and git checkout whatever you like. If you use a branch name, the submodule's HEAD will now be attached. As far as the superproject Git is concerned, this doesn't matter: what matters to the superproject is the actual hash ID. If the submodule's git rev-parse HEAD still produces the same hash ID that's in the superproject's index's gitlink, everything still matches up. If it produces some other hash ID, it's up to you to resolve that. Since you want a different commit, you should now update the superproject's index's gitlink.

PERSON: But, hey Git: I told you to remember a branch name in the superproject, for this specific submodule. How about you, Git, command the Git that's controlling the submodule to git checkout that branch name?

You can do that. But the assumption here is that you had your Git clone your submodule Git from an upstream repository that you don't control and that you don't git push to. So this isn't sufficient and what Git provides instead is the command:

git submodule update --remote

In this case, the superproject Git will:

enter the submodule
command the submodule Git to run git fetch
wait for origin/master, or whatever it is, to get updated in the submodule, as a result of that git fetch
find the new hash ID of the submodule's origin/master (or whatever) based on the upstream (that you don't control, but that you just fetched-from)

and then have the submodule Git git checkout that hash ID ... as a detached HEAD, again!

If you do control the submodule, and want to make new commits in it, you need to cd into the submodule and just git checkout whatever branch name you want, and then do your work there. This submodule is, after all, a regular old Git repository. You can do whatever work you like and then run git add to copy any updated work-tree files into the index—git commit is going to use what's in the index—and run git commit to make the new commit.

Then, having done all of this, you can either push the commit upstream right now, or wait and cd back into the superproject. Either way you can now do whatever work is required in the superproject, and git add any modified files and the name of the submodule. You're not just updating the files in the superproject index, you also need to update the gitlink in the superproject. And now that you've done all of that, you can run git commit in the superproject, to make a new commit that stores the updated files and the updated gitlink.

Now that you have a new superproject commit, you can git push the new superproject commit somewhere. If you already used git push in the submodule, that's all you need to do. But if not, you should git push the submodule first. The reason is pretty obvious, once you think about it: the new superproject commit says: When you get to this submodule, read this gitlink and extract commit a987654... (or whatever hash ID it is). For someone else to do that, they'll need to git fetch the upstream submodule that you have to already have git pushed to, so as to get commit a987654... into their submodule Git!

Note that this matches the git submodule update --remote action: they will go to their submodule's upstream, git fetch the updated branch, and then git checkout the appropriate origin/branch hash ID, in this case a987654..., as the detached HEAD for their submodule.

This is not the smoothest process ever, but it's as simple as Git itself can make it. There are several more things that git submodule update can do, but they all start with this frame of mind: the submodule repository is cloned from somewhere else, and it is primarily the somewhere else that supplies new commits for the submodule.

score 2 · Accepted Answer · answered May 31 '19 at 17:14

2

Your superproject remembers that the submodule must be at the commit a4709b3. But the submodule was updated and now its master points to 42309e0. What you should do now depends on what code (what commit) you want to use with the superproject. The simplest solution is to check out the stored commit:

cd <submodule_path>
git checkout a4709b3

The submodule will be in detached HEAD state. Nothing to worry about.

The other possibility is to update the submodule:

cd <submodule_path>
git checkout master # reconcile detached HEAD
git pull origin master

and then update the superproject:

cd <parent_dir_path>
git add <submodule_path>
git commit -m "Update submodule"

answered May 31 '19 at 17:14

phd

82,685
13
120
165

Thanks for your answer. So in the first place, how can I avoid similar issues in the future? When I added the submodule to the super project, I ran `git submodule add https://...`. – Tokyo May 31 '19 at 17:31
@Old-School You can't avoid this "issue". If you don't want to deal with that, don't update the submodule. – alfunx May 31 '19 at 17:59
Me too don't see any issues here. Most of the time submodules are in detached HEAD state. When you update your local copy of a submodule you also commit the change to the superproject. That's all, no issues, really. – phd May 31 '19 at 22:15