132

Background

To populate a repository's submodules, one typically invokes:

git submodule init
git submodule update

In this usage, git submodule init seems to do only one thing: populate .git/config with information that is already in .gitmodules.

What is the point of that?

Couldn't git submodule update simply use the information from .gitmodules? This would avoid both:

  • an unnecessary command (git submodule init); and
  • an unnecessary duplication of data (.gitmodules content into .git/config).

Question

Either:

  • there are use-cases for git submodule init that I do not know (in which case, please enlighten me!); or else
  • git submodule init is cruft that could be deprecated in Git without any harm.

Which of these is true?

TylerH
  • 20,799
  • 66
  • 75
  • 101
  • the command I see in the repo I am seeing is backwards `git submodule update && git submodule init` what is the point of this? – Charlie Parker Jan 03 '23 at 20:10

3 Answers3

117

Imagine the repository has 10 submodules and you are interested in only two submodules of these. In such a case, you may want to get updates from only these two submodules from the remote repository from time to time. git init works well for this, because once you execute the command git init for these two submodules, git submodule update --remote applies only to them.


Appended two workflows demo.

Workflow1: Submodules are libraries which several projects use.

I think this is one of the common use cases.

You just cloned "my-project".

git clone https://example.com/demo/my-project

And the surface of its structure is like below.

Enter image description here

The contents of .gitmodules

[submodule "lib1"]
    path = lib1
    url = https://example.com/demo/lib1
[submodule "lib2"]
    path = lib2
    url = https://example.com/demo/lib2
[submodule "lib3"]
    path = lib3
    url = https://example.com/demo/lib3
[submodule "lib4"]
    path = lib4
    url = https://example.com/demo/lib4

You want to refactor the code code1.js which references lib1 and lib2 which means you don't need to clone and checkout lib3 and lib4. So you just run the below command.

git submodule init lib1 lib2

Now let's see the contents of .git/config

...
[submodule "lib1"]
    active = true
    url = https://example.com/demo/lib1
[submodule "lib2"]
    active = true
    url = https://example.com/demo/lib2

This means something like "Ready to update lib1 and lib2 from example.com/demo".

At this point, lib1 and lib2 directories are empty. You can clone and checkout lib1 and lib2 with one command:

git submodule update

Now you are able to refactor code1.js without import errors.

Submodules are just references to certain commits. So when you want to update libraries to new versions, you have to update the references. You can do it by the below command.

git submodule update --remote

Now you can see how useful it is to only initialize the submodules you need.

Workflow 2: Each submodule is a project and one big top project includes them.

I'm a fan of this.

You clone "main-project".

git clone https://example.com/demo/main-project

And the surface of its structure is like below.

Enter image description here

You can see a directory named "shared". There is a rule in this workflow: if you want to use shared codes of main-project in your project, you have to create the project as a submodule of main-project.

I like to put entity classes in shared directory like below.

Enter image description here

Back to the submodule workflow, the contents of .gitmodules is like the following.

[submodule "sub-project1"]
    path = sub-project1
    url = https://example.com/demo/sub-project1
[submodule "sub-project2"]
    path = sub-project2
    url = https://example.com/demo/sub-project2
[submodule "sub-project3"]
    path = sub-project3
    url = https://example.com/demo/sub-project3
[submodule "sub-project4"]
    path = sub-project4
    url = https://example.com/demo/sub-project4

This time you want to refactor some code in the shared directory of the main-project and you know that only sub-project1 and sub-project2 reference shared code, which means you don't need to clone and checkout sub-project3 and sub-project4. So you just run the command below.

git submodule init sub-project1 sub-project2

And like I mentioned in workflow1, you need to run the command below to clone and checkout them.

git submodule update

Would I do git submodule update --remote in this case? Or do I even have to init and update submodules to refactor code in the shared directory? Yes, because you have to run tests in submodules after refactoring the shared code and if any update of submodules is committed and pushed to the remote repository while you are refactoring, then you need to get it by git submodule update --remote.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Nigiri
  • 3,469
  • 6
  • 29
  • 52
  • 1
    Thanks for pointing out that `git submodule init` might be helpful in this use case. I haven't tried it yet, but have upvoted your answer for making me think more broadly about git submodule workflows :) Would you mind adding a code block illustrating the workflow you hinted at? I think this would make for an answer that is even more valuable to the community. Thanks again :) –  Oct 05 '17 at 19:47
  • 1
    Thanks for correcting my English. I added two workflows. I’m not confident this helps someone though. – Nigiri Oct 06 '17 at 09:07
  • If you have other submodules and just want to init + update some specific submodules then the workflow becomes: [git submodule init -- ./lib1 ./lib2] and [git submodule update --remote --recursive -- ./lib1 ./lib2]. Also you could use [--merge] or [--rebase] with the update command but first read what they do because they can avoid a detached head checkout at the risk of possibly mangling their history if HEAD not on the right branch during update. For now you could fix the detached head state with [git submodule foreach "git checkout master && git pull"] (submodules so fun and easy to use xD) – FocusedWolf Sep 30 '18 at 05:55
  • the command I see in the repo I am seeing is backwards `git submodule update && git submodule init` what is the point of this? – Charlie Parker Jan 03 '23 at 20:10
34

Reading the git submodule documentation, there is a use-case that ostensibly justifies the existence of git submodule init as a standalone command.

If a user who has cloned a repository wishes to use a different URL for a submodule than is specified by the upstream repository, then that user can:

git submodule init
vim .git/config # Alter submodule URL as desired, without changing .gitmodules
                # or polluting history.
git submodule update
  • 3
    As, for instance, when the submodule's large and you've already got it locally for other reasons. `git config -f .gitmodules submodule.biglib.url=/path/to/it` is easier than editing the file, and `git submodule update --init` is easier than the two-step when you're happy with the defaults. – jthill Oct 06 '17 at 11:07
  • the command I see in the repo I am seeing is backwards `git submodule update && git submodule init` what is the point of this? – Charlie Parker Jan 03 '23 at 20:10
1

As far as I understand this is what the commands do & hence the order required:

# - git submodule init initializes your local configuration file to track the submodules your repository uses, it just sets up the configuration so that you can use the git submodule update command to clone and update the submodules.
git submodule init
# - git submodule update --init initializes your local configuration file and clones the submodules for you, using the commit specified in the main repository.
#   note, command bellow will not pull the right branch -- even if it's in your .gitmodules file, for that you need remote. Likely because it looks at the origin (pointer to remote) in github for the available branches.
#   note, bellow pulls the submodules if you didn't specify them when cloning parent project, ref: https://youtu.be/wTGIDDg0tK8?t=119
git submodule update --init
# - The --remote option tells Git to update the submodule to the commit specified in the upstream repository, rather than the commit specified in the main repository.
#git submodule update --init --remote
git submodule update --init --recursive --remote meta-dataset
Charlie Parker
  • 5,884
  • 57
  • 198
  • 323