403

What are the conceptual differences between using git submodule and subtree?

What are the typical scenarios for each?

Nathan H
  • 48,033
  • 60
  • 165
  • 247
  • 3
    This may not answer all your questions but is interesting reading on the subject: http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/ – Chop Aug 02 '15 at 08:12
  • Similar question is http://stackoverflow.com/questions/571232/svnexternals-equivalent-in-git/18088319#18088319 – Michael Freidgeim May 01 '16 at 03:20
  • 1
    "Alternatives to Git Submodules? ": https://stackoverflow.com/questions/6500524/alternatives-to-git-submodules – brillout Dec 21 '18 at 14:28

6 Answers6

464

submodule is link;

subtree is copy

Feng
  • 4,933
  • 2
  • 14
  • 9
243

What if I want the links to always point to the HEAD of the external repo?

You can make a submodule to follow the HEAD of a branch of a submodule remote repo, with:

o git submodule add -b <branch> <repository> [<path>]. (to specify a branch to follow)
o git submodule update --remote which will update the content of the submodule to the latest HEAD from <repository>/<branch>, by default origin/master. Your main project will still track the hashes of the HEAD of the submodule even if --remote is used though.


Plus, as noted by philb in the comments, git subtree is a contrib/, as opposed to git submodule (core command)

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • 2
    your answer seems to go against the voted answer here: http://stackoverflow.com/questions/10443627/force-git-submodule-to-always-stay-current – Nathan H Aug 06 '15 at 09:10
  • 1
    @NathanH this (the possibility to track HEAD) has been added a year later (March 2013, git 1.8.2: https://github.com/git/git/blob/efc8a625e9b03e6f8ceed37ccd4b9167a7447e31/Documentation/RelNotes/1.8.2.txt#L186-L188) – VonC Aug 06 '15 at 09:11
  • I see the submodule followship behavior is also mentioned in [your other anwer](http://stackoverflow.com/a/9189815/1509695). In that case I think you mean to say that always pointing to the HEAD of a submodule is accomplished by using both `add -b` and `--remote` thereafter on the update commands, as per the [submodule update documentation](https://git-scm.com/docs/git-submodule). In that case, is the `-b` really still required for following HEAD of master? – matanster Oct 29 '15 at 14:20
  • @matt the `-b` is used to generate the right .gitmodule metadata for the submodule (it is equivalent to a `git config -f .gitmodules submodule..branch `). – VonC Oct 29 '15 at 15:33
  • Then it has little to do with enabling `--remote` - `--remote` works also if `-b` hasn't been used on `add`. In both cases the update will cause a commit in the parent repo housing the submodule, so the links do not really "always point to the HEAD" in a very automatic way.... either I didn't get it, or that claim better be removed from the original answer (?) – matanster Oct 29 '15 at 15:38
  • @matt sure, `--remote` will work without `-b`, because it will default to `origin/master`. But with or without `-b`, a submodule never follows "automatically" a branch, as I mentioned in http://stackoverflow.com/a/20797186/6309. It checks out the SHA1 memorized in the gitlink, and then updtate its content if you do a `git submodule update --remote`. – VonC Oct 29 '15 at 15:42
  • ([New related question](http://stackoverflow.com/questions/33418718/updating-submodules-without-committing-their-update)) – matanster Oct 29 '15 at 15:48
  • 2
    another point that might be useful: `git submodule` is a "core" Git command, it's part of the Git codebase. `git subtree` is in the "contrib" directory, it's not installed by Git's Makefile (though some distros do ship it), so it's less developed and less maintained. – philb Jul 12 '21 at 16:24
  • 2
    @philb Good point. I have included your comment in the answer for more visibility. – VonC Jul 12 '21 at 17:10
  • 1
    "HEAD" of a branch is nonsensical. It should be "You can make a submodule to follow a branch of a submodule remote repo...". – lmat - Reinstate Monica Nov 05 '21 at 14:50
162

The conceptual difference is:

With git submodules you typically want to separate a large repository into smaller ones. The way of referencing a submodule is maven-style - you are referencing a single commit from the other (submodule) repository. If you need a change within the submodule you have to make a commit/push within the submodule, then reference the new commit in the main repository and then commit/push the changed reference of the main repository. That way you have to have access to both repositories for the complete build.

With git subtree you integrate another repository in yours, including its history. So after integrating it, the size of your repository is probably bigger (so this is no strategy to keep repositories smaller). After the integration there is no connection to the other repository, and you don't need access to it unless you want to get an update. So this strategy is more for code and history reuse - I personally don't use it.

waldyrious
  • 3,683
  • 4
  • 33
  • 41
Niklas P
  • 3,427
  • 2
  • 15
  • 19
  • 4
    But with `git subtree` you still can also push - if you wanted - right? – User Jan 21 '18 at 23:22
  • 2
    @lxx If you know the repository URL… – Franklin Yu Jan 24 '18 at 23:05
  • @FranklinYu Why would he not know that? can't get that info from the local git meta data? – adi518 Mar 19 '18 at 22:53
  • 2
    @adi518 Yes, if you are the one who created the subtree. However, if you pushed your repository to GitHub and others clone it down, I don’t think he/she automatically knows the subtree URL. – Franklin Yu Mar 20 '18 at 04:33
  • 2
    @NiklasP - can you elaborate on "reference the new commit in the main repository"? That's the one step I'm not clear on how to execute and therefore "changed reference" isn't something I understand either. – Robert Oschler Apr 04 '18 at 16:07
  • "If you need a change within the submodule you have to make a commit/push within the submodule, then reference the new commit in the main repository and then commit/push the changed reference of the main repository." But the main repo can track a branch of the submodule, right? So in that case the main repo wouldn't need to do anything to use the updated code? – Nathan Wailes Jun 09 '23 at 17:40
30

sub-module
pushing a main repo to a remote doesn't push sub-module's files

sub-tree
pushing a main repo to remote pushes sub-tree's files

Maciek Rek
  • 1,525
  • 2
  • 14
  • 18
  • 7
    "pushing a main repo to remote pushes sub-tree's files" No, it doesn't. – J Bramble Jan 16 '17 at 16:22
  • @JBramble I should probably mention that it's done with the SourceTree app eg: `git -c diff.mnemonicprefix=false -c core.quotepath=false -c credential.helper=sourcetree push -v --tags production refs/heads/master:refs/heads/master` – Maciek Rek Feb 23 '17 at 14:10
9

The simplest way to think of subtrees and submodules is that a subtree is a copy of a repository that is pulled into a parent repository while a submodule is a pointer to a specific commit or branch in another repository.

Nathan Wailes
  • 9,872
  • 7
  • 57
  • 95
Pervaiz Iqbal
  • 316
  • 2
  • 8
  • submodules can now track particular branches as of git 1.8.2: https://www.activestate.com/blog/getting-git-submodule-track-branch/ – Nathan Wailes Jun 09 '23 at 17:44
3

[Git Submodule - Atlassian]

Git submodule is useful when you want to keep the embedded repository's commit history separate from the main repository. However, using submodules can be complex and difficult to manage, especially when you need to update the embedded repository.

[Git Subtree and comparison with Submodule - Atlassian]

Git subtree is a solution that allows merging one repository into another as a subdirectory, but keeping the entire commit history. It is useful when you want to share a set of files between different projects without the need to maintain a separate repository. Using a subtree is simpler than using a submodule and is generally easier to manage.

In short, if you need to keep the shared repository's commit history separate from the main repository, git submodule might be the best choice. If you need to share a set of files between different projects without the need to maintain a separate repository, git subtree might be the best choice.


Get/Update Workflow Comparison

Let's compare the commands for sending and receiving updates:

1. Submodule

#push updates:
cd path/to/submodule
1. git add .
2. git commit -m "Submodule Update"
3. git push origin master
cd ..
4. git add submodule
5. git commit -m "Submodule ref update"
6. git push origin master
# >Needs to be in this order! Easy to get trouble<

#pull:
git submodule update --remote

2. Subtree

#push updates:
cd path/to/shared/repo
1. git add .
2. git commit -m "Subtree update"
3. git push origin master
#then
4. git subtree push --prefix=path/to/shared/repo shared-repo master

#pull:
git subtree pull --prefix=path/to/shared/repo shared-repo master
Mithsew
  • 1,129
  • 8
  • 20