Will Git garbage-collect commit in submodule referred to by a top-level repository?

Question

Let's say:

 top.git     
 └── sub.git => 75fc7

The top-level Git repository top.git refers to commit 75fc7 in sub.git.
The submodule Git repository sub.git has neither branches nor tags leading to commit 75fc7 (unreachable).

Will sub.git eventually garbage-collect this commit 75fc7 because nothing can reach it?

AFAIK, Git submodules designed the way that, in this example, sub.git is not able to establish the fact that it is a submodule of any other repository. In other words, commit 75fc7 is effectively a candidate for garbage collection. Then it would be unreliable to restore state of all submodules if they may "forget" required commits.

score 3 · Answer 1 · edited May 23 '17 at 11:47

3

Yes, the commit will be eventually garbage collected.

But don't forget that, to be reused, a submodule referenced by its parent repo must also published that recorded SHA1 (recorded as a gitlink, a special entry in the index of the parent repo).

If that SHA1 is not published (pushed to an upstream repo), then any clone of the parent repo would not be able to checkout the submodule anyway.
That means a submodule must push the recorded SHA1, which makes that SHA1 referenced (by a branch or tag, as pushed on the upstream repo)

So the issue is not so much the garbage collector here, but just the capability of a parent repo to checkout its submodule to the right SHA1.

My scenario (not explicitly mentioned in question) is actually different and more specific. What if the commits are actually pushed upstream for both top.git and sub.git?

Then you don't need to wait for a gc to remove a non-accessible SHA1 for the issue to manifest.
If the published SHA1 is no longer referable, it means any clone of top.git won't be able to checkout the sub.git submodule repo at the right SHA1 (even if gc hasn't run yet), because the non-referred SHA1 won't be part of the sub.git clone anyway.

The key point to understand: an upstream repo sub.git has no idea it is used as a submodule by another upstream repo (like top.git).

If sub.git does not include the right SHA1 (used by top.git) for any reason (gc or other rebase/push --force or ...), a clone of top.git will fail to restore the submodule to its proper state.

edited May 23 '17 at 11:47

Community

1
1

answered Jul 26 '15 at 18:22

VonC

1,262,500
529
4,410
5,250

Note to self: That was my **13000th answer** on Stack Overflow (in 82 months), less than 6 months after the [12000th answer](http://stackoverflow.com/a/28412501/6309). Before that, [11000th answer](http://stackoverflow.com/a/25821796/6309), [10000th answer](http://stackoverflow.com/a/23909654/6309), [9000th answer](http://stackoverflow.com/a/20683667/6309), [8000th answer](http://stackoverflow.com/a/17569094/6309), [7000th answer](http://stackoverflow.com/a/14274272/6309), [6000th answer](http://stackoverflow.com/a/11644343/6309), [5000th answer](http://stackoverflow.com/a/7917396/6309),... – VonC Jul 26 '15 at 19:05
Although your answer is a good reminder of other issues with Git submodules. My scenario (not explicitly mentioned in question) is actually different and more specific. What if the commits are actually pushed upstream for both `top.git` and `sub.git`? In this case, we won't have problems with checking out right away. **= No missing commit problem yet.** However, if the named reference (branch or tag) are removed from upstream `sub.git` later, than reference to the commit in `top.git` will eventually fail as `sub.git` will garbage-collect it. **= Missing commit problem appeared.** – uvsmtid Jul 27 '15 at 03:59
@uvsmtid that is a non-issue on the upstream side: if the published commit disappear for *any* reason (like after a `git push --force`, which rewrites the history and makes the SHA1 non-referenced), you don't need to wait for a gc for the "missing commit problem" to manifest: any clone of the parent repo won't be able to checkout the submodule immediately, as soon as the SHA1 is not there on the upstream repo of said submodule. – VonC Jul 27 '15 at 05:26

score 2 · Answer 2 · edited May 23 '17 at 10:24

2

Actually, it was easy to test thanks to this answer.

Yes, the commit was garbage-collected even if it was referenced by top-level repository.

Then it demands some measures or discipline in what commits can be used in top-level repository in order to reliably restore entire tree spanning submodules at any time in the future. Such commits must be ancestors to any long-term maintained branch or tag.

edited May 23 '17 at 10:24

Community

1
1

answered Jul 26 '15 at 18:19

uvsmtid

4,187
4
38
64

1

I was going to reference that old answer of mine ;) – VonC Jul 26 '15 at 18:19
I think it’s more about discipline of the subrepository since it needs to produce stable branches… but good answer, thanks :) – poke Jul 26 '15 at 18:22

Will Git garbage-collect commit in submodule referred to by a top-level repository?

2 Answers2

Linked