You are correct: A submodule is a repository in its own right. This means you can cd
or chdir
into the submodule and start using it the same way you would use any other Git repository, including running git fetch
, git checkout
, git commit
, and so on.
What makes a submodule a submodule is the fact that some other Git, positioned somewhere above the submodule, is controlling the submodule, normally by running git checkout hash-id
in it. This puts the submodule into what Git calls detached HEAD mode. (The controlling Git is the superproject.)
This "detached HEAD" mode is a bit tricky. You can make new commits in this mode, but when you do, they're not findable by most ordinary means. They're easily findable for the moment by the special name HEAD
, but this special name HEAD
will be forcibly adjusted by the superproject, which will chdir
into the submodule and run git checkout hash-id
again, losing1 the commits you've made.
To send, to some other Git repository, commits you've made in your submodule Git repository, you must give those commits a name in the other Git repository. Usually, that means giving them a branch name over there. There's no hard requirement to use a branch name in your repository, as you can run:
git push <remote-or-URL> HEAD:<name>
to send the commit identified in your submodule repository via the name HEAD
, to the other Git, and politely ask it to create or update its name name
to point to that commit. But most people don't like working on/with detached HEAD mode.
What this means, in general, is that to do work in the Git repository that is acting as a submodule for some superproject, you should use the following sequence:2
Enter the submodule. From here through step 3, you'll just work with this as a regular Git repository.
Exit detached HEAD mode by selecting a branch name to git checkout
, or by creating a branch name pointing to the current commit. Note that if you choose some existing branch name, that may be a different commit from the current commit. Or it may be the same commit as the current commit.
Remember, the superproject repository earlier told this Git: use this raw commit, by its hash ID to get into detached HEAD mode. We're now getting out of detached HEAD mode, which requires picking or creating a branch name. If you pick some existing branch name, you're stuck with whatever commit that branch name chooses. But if you're developing new commits in the submodule repository, you probably want a branch name to remember them.
Now, make new commit(s) in the usual way. Use git push
in the usual way. The commits will, or won't, go to the receiving repository in the usual way that commits do or don't. If they do make it to the receiving repository, that repository's branch name will be created or updated in the usual way.
Once everything is done, exit the submodule repository, returning to the superproject repository. It's now time to make a new commit in the superproject.
I already mentioned, several times, that the superproject Git keeps controlling the submodule Git. It does a chdir
into the submodule Git and runs git checkout hash-id
. The key here comes in two parts:
- When does the superproject Git do this to the submodule?
- Where does the superproject Git get the raw hash ID?
The answer to the first question is complicated: git submodule update
does it, but not always;3 git checkout --recursive
does it (always); various other operations can sometimes do it, depending on options and settings. That is, it doesn't happen unless and until you ask for it to happen, but it's not always obvious that you are asking for it to happen. What we're about to do is to make sure we address the second point, before it happens again.
The answer to the second question—where does the superproject Git get the raw hash ID—is that it gets it from commits in the superproject. But you've made a new commit in the submodule, and delivered it upstream to some other Git repository, so now it's time to make a new commit in the superproject, to record the right hash ID, i.e., the hash ID of the new commit you made in the submodule.
As always, no commit can ever be changed; and as always, Git makes new commits from whatever is in the index (aka staging area). When you extract some existing commit, Git reads the files from that commit into the index / staging-area. If the repository is acting as a superproject for some submodule, this step also reads the desired hash ID (in that submodule) from the commit, into the index. Since that's no longer the desired hash ID, your job is now to put the new hash ID into the index.
The command that does this, in the superproject, is git add
. As always, git add
writes stuff into the index. For a normal file, it writes into the index a copy of that file.4 For a submodule, though, it:
- enters the submodule for a moment;
- asks that Git: what is the raw hash ID for the commit identified by your
HEAD
? (git rev-parse HEAD
);
- stuffs the resulting hash ID, whatever it is, into the superproject Git's index, in the slot for that submodule.
This works whether the submodule is in detached HEAD
state or not, because git rev-parse HEAD
always returns the raw hash ID of the current commit.
So, after git add path/to/submodule
, the hash ID of the commit that you selected (and in fact made and pushed) is now recorded in the index in the superproject. Your next commit will record that raw hash ID.
Assuming everything else is also ready, you can now run git commit
to make a new commit in the superproject (which presumably is, and has been all along, in attached-HEAD state, on some branch name). Once you do make this new superproject commit, you're ready to git push
it as usual.
Note the careful ordering of steps here:
- Enter the submodule.
- Make the commit(s) in the submodule, on branches for simplicity, but however you make them is OK.
- Send those commits to wherever it is that people clone the submodule from, so that that Git has them. This step requires setting a branch or tag name in that other Git.
- Now that other people can access that commit in the submodule's
origin
repository, make and push a commit in the suprerproject that refers to the submodule commit's raw hash ID.
It is possible—because of Git's distributed nature—to make the commit in your submodule but not push it anywhere, then to make the commit in your superproject and push it. Anyone who gets this new commit gets a commit that refers, by raw hash ID, to a commit they not only don't have, but can't even get. So step 3 ("make the commit available to everyone") must happen before the push
in step 4. (The "make commit" in step 4 can happen earlier—just be careful not to push it, and to redo it with any updated commit hash, if the submodule commit has to be redone for any reason.)
1Losing here means "make hard to find". The commits themselves won't vanish right away: they have the same grace period that other lost commits get, and you can use the submodule Git's HEAD
reflog to find them, the same way you find lost commits in any repository—because the submodule is just another repository, after all.
2Because Git is a set of tools, not a pre-packaged solution, there are a lot of other ways to accomplish your goal. This is just one very flexible way.
3In particular, git submodule update
has many update modes. With some arguments, you can direct git submodule update
to check out a name in the submodule, resulting in an attached HEAD (not a detached one) in the first place! That's part of what footnote 2 is referring to. The detailed workings of git submodule update
are quite complicated, so I'm trying to avoid these variants in this answer.
4Technically, it writes out a blob object with the right file contents—or reuses some existing blob object unchanged, if possible—and then writes the blob object's hash ID into the index, rather than the actual file content. But the effect is as if Git copied the file into the index, as long as you don't get down to the level of git ls-files --stage
and git update-index
.