Does git pull --recurse-submodule really pulls latest commit in submodule?

Question

This thread's (Easy way to pull latest of all git submodules) answer states that when running:

$ git pull --recurse-submodules

will pull the latest commits in the submodules. But when trying it, it doesn't seem to work.

I have a demo repo with a demo submodule called "tinyXml" (It's not the real tinyXml, just a demo). Take a look at the following shell interaction:

PS D:\DemoProject> cd .\tinyXml
PS D:\DemoProject\tinyXml> git status 
HEAD detached at 9101c63
nothing to commit, working tree clean
PS D:\DemoProject\tinyXml> cd ..     
PS D:\DemoProject> git pull --recurse-submodules                  
Fetching submodule tinyXml
Already up to date.
PS D:\DemoProject> cd .\tinyXml\
PS D:\DemoProject\tinyXml> git status      
HEAD detached at 9101c63
nothing to commit, working tree clean
PS D:\DemoProject\tinyXml> cd ..
PS D:\DemoProject> git submodule update --remote      
Submodule path 'tinyXml': checked out 'e249788ed10afbdff043f758f46add75b81d522a'

So you see that git submodule update --remote works, but git pull --recurse-submodules didn't pull the latest commit in the submodule.

My git version - 2.32.0

score 4 · Answer 1 · answered Jul 26 '21 at 11:39

The short answer is no. That's because submodules are not supposed to use the latest commit.

The intent of a submodule is that the superproject repository specifies which commit is to be used in the submodule. The superproject repository does not list a branch name to be used in the submodule; that's not reliable. So instead, the superproject repository lists the raw hash ID to be used in the submodule, as this is reliable.

The submodules are therefore checked out as "detached HEAD"s. This is by design.

Remember that each submodule is a separate Git repository. It is not part of the superproject. The superproject repository simply lists the URL for cloning the submodule, so that Git can run git clone, plus the path (so that the superproject knows where to put the submodule) and the correct hash ID (so that the superproject knows which commit to use as the detached HEAD).

What you want instead

Now, there is a way, from the superproject repository, to make a request to the superproject Git to do something different in the submodule. That way is not to run git pull. Instead, you want git submodule update, with specific flags. The specific flags you want include, but are not necessarily limited to, --remote.

I said above that the superproject does not list a branch for each submodule. That's only partly true. The superproject can list a branch name for each submodule. But remember this: a branch name is just a way to say "get some particular commit". Which particular commit is that? Well, that's the tricky part, because each Git repository has its own branch names. Your superproject Git has its own branch names, and each submodule that you've cloned has its own branch names, and the repositories from which each repository is cloned have their own branch names, and so on.

This gets very confusing, because at this point there are an absolute minimum of four Git repositories involved here:

You have your own clone of the superproject, as your repository R.
Your R has an origin from which you get commits.
Your R has some submodule S, which your Git creates by running git clone.
Your S has an origin from which it gets commits.

When you clone some Git repository, you get all of their commits and none of their branches. Your Git then creates one branch in your own local repository. Your Git takes their branch names and changes them, transforming them into your own Git's remote-tracking names: their main or master becomes your origin/main or origin/master, for instance.

This is also true for each of your submodules. When your superproject Git runs git clone url path/to/submodule to clone some other repository as the submodule living in path/to/submodule, that submodule Git repository that this git clone command copies ... well, this is just like any other git clone. Git copies, from that URL, all of their commits and none of their branches. So a branch name is actually not useful here. The "branch name" that your superproject repository R is allowed, but not required, to store for submodule S is not your branch name at all—since that's not useful—but rather a branch name in S's origin.

Your superproject Git will, in R, run, in effect:

(cd path/to/S; git fetch)

This will make your Git-that-controls-S use git fetch to obtain commits from S's origin. This will update S's remote-tracking names, such as S's origin/main and origin/feature and so on.

If R says that S should use "branch feature", this really means that your Git, operating in R, should run:

(cd path/to/S; git fetch)

followed by:

(cd path/to/S; git rev-parse origin/feature)

This git rev-parse running inside S will obtain the commit hash ID for origin/feature as found in S, which was just updated by the git fetch that your Git ran in S, to get the raw hash ID.

This is what the --remote in git submodule update --remote means: after running git fetch in the submodule, use the branch name stored in R to construct the remote-tracking name that will be found in S to obtain the commit hash ID for a commit to look for in S.

The rest of the git submodule update command determines what to do with this hash ID. With --checkout, the operation to do with this hash ID is a git checkout of a detached HEAD using that hash ID.

Hence:

git submodule update --recurse --checkout --remote

will enter each submodule S of your repository R and run git fetch and git rev-parse and do a detached-HEAD checkout in S as appropriate and will recurse inside S to update any submodules that S has.

You might not want --checkout though. What you do want depends, on too many factors to really go into here. The one thing I will note is that, once the submodules are on the commits you want them to be on, you need to make new superproject commits to record these hash IDs in the superproject(s). This, too, can be painful.

This is insanely complicated. Why is Git so hard?

Yes, it is. I can't answer the "why", other than to note that this is, fundamentally, a hard problem.

This isn't what I want to do. Why can't Git do _____ instead?

Someone is probably working on that right now. There are projects underway to improve submodules. It's ... hard.

This is what there is today. Don't use git pull; it is not up to the job. Even git submodule update may not be up to your job, whatever that is.

score 0 · Answer 2 · answered Jul 26 '21 at 10:59

0

As you can see by your log, some of your submodules are not on a branch, but in a detached state.

You should enter each submodule and checkout the right branch, the go back and issue again the git pull --recurse-submodules command.

answered Jul 26 '21 at 10:59

Antonio Petricca

8,891
5
36
74

Does git pull --recurse-submodule really pulls latest commit in submodule?

2 Answers2

What you want instead

This is insanely complicated. Why is Git so hard?

This isn't what I want to do. Why can't Git do _____ instead?