11

I have a couple of working trees with some dependences. AFAIK, git submodule would enforce the following:

  • have a copy of each working tree (slave) in a subdirectory of each working tree using it (master)
  • the master repository duplicates all the information from slaves

I don't mind the repos getting bigger, but having the copies is quite unacceptable for me. It would force me to reorganize all the projects, so that the copy would get linked. Moreover, editing of a wrong file could easily happen leading to confusion.

I've got another idea:

  • Each master stores a list of all its slaves.
  • No other information in the master is required.
  • With each commit in master, a "snapshot-commit" in the slave gets created.
  • The "snapshot-commit" is a snapshot of the current state of the working tree, it ignores the current state of the index (I'm already using "snapshot-commits" before throwing away some uncommited changes).
  • The "snapshot-commits" get collected in a branch whose name is derived from the master's name. The commit message contains the hash of the master commit. (IMHO, this is better than flooding by thousands of tags.)
  • A checkout works as usual, unless recursion into slaves is required.

The only problems I can see are the following:

  • The commits in the slaves will accumulate, and never get deleted even when the master commits no longer exist.
  • Commits in the master are not self-containing, you could delete a commit referred in the master. But I see no chance it could happen by accident, so I can live with it.
  • I can't imagine, how other git command could support this. But again, I can live with it.

I'm asking if somebody already implemented it (or if it's a bad idea).

maaartinus
  • 44,714
  • 32
  • 161
  • 320

2 Answers2

11

I think this is a bad idea because it's strange and it will take you off the supported path for many things.

First a clarification: When using submodules, the 'master' (referencing) repo does not get appreciably bigger. It stores only a repository reference (URL, probably) and a commit ID. But that doesn't seem to be the sticking point here.

When dealing with a problem like this there are three basically 3 paths you can go down:

  1. Put everything in a single repository. Have you convinced yourself 10 times that you really need to separate things out? Remember that you can start in one repo and split things out later. Also remember that git merges actually work, so developer contention isn't that much of an issue.

  2. Use some external package management system. Git is NOT, and doesn't pretend to be, a package manager. Odds are good that the platform you're using has a package manager that supports more complex dependency situations. Maven, rubygems, npm, nuget... there are lots of them.

  3. Use submodules 'mounted' in subdirectories.

Basically, submodules should be your last choice when dealing with your own code. They're great for dealing with third party libraries, but end up being a royal pain for your own code. Add on top of that the complicated solution you're proposing, and it just won't be very fun to work in.

Russell Mull
  • 1,151
  • 8
  • 15
  • Thanks for the clarification and for all the advice. What I'm currently doing is using multiple eclipse projects each with its own git repository. The dependencies between them are weak enough for this to work, and what I'm after would solve the only remaining problem: From time to time some change occurs in a referenced repo which needs changes in the referencing one. This makes going back in time over such a boundary complicated and what I'm looking for could solve it. I don't need it often, so any possible problems will be rare as well. The mounted submodules could do what I need... – maaartinus Aug 05 '12 at 22:47
  • 1
    Having been there and done that, I really don't think it's worth the effort. Having a bunch of submodules is one of those things that sounds like a really nice and elegant idea (which it is) but the day-to-day usage is just a pain. I can't encourage you enough to start out with a single repo. – Russell Mull Aug 06 '12 at 02:23
  • 1
    I disagree with @RussellMull, and I'd like to encourage you to not put everything in one repo. Combining too many things into a single repository is a miserable experience, even on my own projects where I'm the only one working on it. Submodules aren't great for this sort of thing, but they're much better than mixing unrelated projects together in one git repo. – James Moore Sep 01 '12 at 05:04
2

I am not sure I am following you since a parent repo (your "master") only store a reference to the tight SHA1 of a submodule (the sub-repo checked out within the parent repo).
The size of the parent repo isn't affected at all.

The subtree merge strategy (better managed though git subtree) would increase the size of the parent repo, but that (subtree merge) isn't what you are talking about.

The other alternative to submodule would be git-slave (gits), which is a bit like you want to implement.

Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Do you mean, I was wrong with my sentence "the master repository duplicates all the information from slaves"? This is quite possible, however, my main concern is the existence of a copy of each the slave tree in each master tree (or am I wrong again?). – maaartinus Jul 18 '11 at 08:56
  • @maaartinus: there is a physical copy (since it checkouts a certain diff), but all the parent repo keeps is a reference to the checked out commit. See "true nature of submodules" here: http://stackoverflow.com/questions/1979167/git-submodule-update/1979194#1979194 – VonC Jul 18 '11 at 09:04
  • @maaartinus: It is true, however, that each parent repo will checkout the submodule, meaning several copy of said submodule will exist at any given time. – VonC Jul 18 '11 at 09:05