How to remove a submodule going forward, but keep its history (as linked from parent history)?

Question

Say I have a project that has a dependency implemented using git submodule. Now I'm making a change where this dependency is no longer needed. I want to commit a change that works as follows:

If anyone checks out this commit or any descendants, the submodule doesn't exist.
But if anyone checks out an older commit, or a commit on another branch not merged with this one, the submodule reappears, just as a deleted file would.
The submodule's own git database (.git/modules/path/to/submodule) must be preserved as it may contain commits not pushed to a remote.

In other words, I do NOT want to obliterate the submodule as directed by the answers to How do I remove a submodule?. In fact I wrote this question as a counterpoint to clarify that one.[1]

When I get some time I will try some experiments. It may be as simple as git submodule deinit and/or removing its entry from .gitsubmodules. I searched Stack Overflow and found no questions or answers addressing this case specifically. Even the superbly written Mastering Git submodules is not clear about this.

[1]: The many steps required in those answers tells me that such obliteration is not "normal", otherwise git would include a porcelain command that did it all for you. Instead git deinit is provided with very narrow behavior. I think it very intentional.

Your third bullet-point requirement is the really tricky part. It cannot be guaranteed in any current design, because a submodule repository is not part of its superproject. — torek, May 22 '20 at 22:24
@torek is it tricky for good reason -- i.e. it's an unreasonable requirement in the first place, use subtree or something else? Or for bad reason: a gap in the design for something that should be handled? — Inigo, May 22 '20 at 22:29
It's tricky because the original design, as VonC says, assumes that your submodule clone has no value: that it can be thrown away at any time because you can just re-clone it at any time, without losing anything of value. That assumption still lingers. — torek, May 22 '20 at 22:36

score 1 · Accepted Answer · answered May 22 '20 at 21:03

1

The git submodule deinit that I documented in 2013 and its associated rm -rf .git/modules/a/submodule both assume the removed submodule was already pushed.

Submodules were initially introduced to be used as read-only, in order to get other repository content into your repository, without necessarily the intent of modifying them.
This differs from subtree, where modifications are more naturally expected.

That being said, yes, if you remove a submodule while having not committed/pushed local changes to said submodule, the end result won't be satisfactory.

A possible patch idea would be to block/fail the git submodule deinit command when it detects that the submodule current HEAD does not match its own internal remote tracking branch (its own origin/master for instance)

answered May 22 '20 at 21:03

VonC

1,262,500
529
4,410
5,250

Thanks. I ran into this issue where I previously added a third-party library into my project as a submodule so I could make local modifications as needed by the parent project. I didn't use subtree because I saw that solution for own modules, not mods of third party modules (I may be wrong on that). I've since realized node's `package.json` supports local unpublished modules so now do that instead (see https://stackoverflow.com/a/61961021/8910547). But I still want to be able to go back to old commit that use the abandoned solution. – Inigo May 22 '20 at 21:53
1

@Inigo I know Christophe's old 2015 article well. But a *lot* has been done on submodule since then. – VonC May 22 '20 at 21:57
@Inigo The `rm` step is necessary when doing a `deinit`. Hence my patch idea, for making Git more robust for your use case. – VonC May 22 '20 at 22:01
It's necessary because of the possibility of adding a different submodule (different repo) at the same path? – Inigo May 22 '20 at 22:04
FYI, I'm happy to delete this new question if it makes sense. But I think the original Q and answers need updates/clarifications. – Inigo May 22 '20 at 22:05
2

This new question is important, and should be debated on the Git mailing list: please leave it opened here. – VonC May 22 '20 at 22:09
@Inigo yes, the `rm` step is necessary to ensure a coherent state for the local Git repo (one where the module is no longer referenced) – VonC May 22 '20 at 22:09
I think we may be talking about two different `rm` steps. `rm -rf .git/modules/a/submodule` vs `git rm -f a/submodule`? Skipping the former I don't believe results in any incoherency. Won't a checkout of past commits make use of it to restore the worktree of the submodule? – Inigo May 22 '20 at 22:17
@Inigo that would need to be tested: not sure how Git would react if the internel submodule repo is still there. – VonC May 22 '20 at 22:18
hey @vonc, it's been a year. I've not used submodule since, so haven't thought about this again. Did the debate you mentioned ever happen? I'm inclined to accept your answer at this point. – Inigo Jun 06 '21 at 17:08
@Inigo no debate that I know of for now. – VonC Jun 06 '21 at 17:19

How to remove a submodule going forward, but keep its history (as linked from parent history)?

1 Answers1

Linked