97

One of the things I like about the way I have Subversion set up is that I can have a single main repository with multiple projects. When I want to work on a project I can check out just that project. Like this

\main
    \ProductA
    \ProductB
    \Shared

then

svn checkout http://.../main/ProductA

As a new user to git I want to explore a bit of best practice in the field before committing to a specific workflow. From what I've read so far, git stores everything in a single .git folder at the root of the project tree. So I could do one of two things.

  1. Set up a separate project for each Product.
  2. Set up a single massive project and store products in sub folders.

There are dependencies between the products, so the single massive project seems appropriate. We'll be using a server where all the developers can share their code. I've already got this working over SSH & HTTP and that part I love. However, the repositories in SVN are already many GB in size so dragging around the entire repository on each machine seems like a bad idea - especially since we're billed for excessive network bandwidth.

I'd imagine that the Linux kernel project repositories are equally large so there must be a proper way of handling this with Git but I just haven't figured it out yet.

Are there any guidelines or best practices for working with very large multi-project repositories?

Bob Wintemberg
  • 3,212
  • 6
  • 34
  • 44
Paul Alexander
  • 31,970
  • 14
  • 96
  • 151

2 Answers2

65

The guideline is simple, in regards to Git limits:

  • one repo per project
  • a main project with submodules.

The idea is not to store everything in one giant git repo, but build a small repo as a main project, which will reference the right commits of other repos, each one representing a project or common component of its own.


The OP Paul Alexander comments:

This sounds similar to the "externals" support provided by subversion.
We tried this and found it extremely cumbersome to constantly update the version references in the externals since the projects are developed concurrently with dependencies on each other. Is there another option??

@Paul: yes, instead of updating the version from the main project, you either:

  • develop your subprojects directly from within the main project (as explained in "True Nature of submodules"),
  • or you reference in a sub-repo an origin towards the same sub-repo being developed elsewhere: from there you just have to pull from that sub-repo the changes made elsewhere.

In both case, you have to not forget to commit the main project, to record the new configuration. No "external" property to update here. The all process is much more natural.

Honestly, this sounds like a real pain and anything that requires developers to do something manually each time is just going to be a regular source of bugs an maintenance.
I suppose I'll look into automating this with some scripts in the super project.

I replied:

Honestly, you may have been right... that is until latest Git release 1.7.1.
git diff and git status both learned to take into account submodules states even if executed from the main project.
You simply cannot miss submodule modification.

That being said:

Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Also worth noting that if you include submodules into the main project, each submodule is it's own git repository, so you're free to include particular versions of the submodules, certain tags, etc. – Damien Wilson Apr 28 '10 at 19:31
  • 1
    @VonC: This sounds similar to the "externals" support provided by subversion. We tried this and found it extremely cumbersome to constantly update the version references in the externals since the projects are developed concurrently with dependencies on each other. Is there another option?? – Paul Alexander Apr 28 '10 at 20:11
  • @Paul: yes, instead of updating the version from the main project, you either develop your subprojects directly from within the main project (see http://stackoverflow.com/questions/1979167/git-submodule-update/1979194#1979194), or you reference in a sub-repo an origin towards the same sub-repo being developed elsewhere: from there you just have to pull from that sub-repo the changes made elsewhere. In both case, you have to not forget to commit the main project, to record the new configuration. no "external" property to update. The all process is much more natural. – VonC Apr 28 '10 at 20:22
  • Honestly, this sounds like a real pain and anything that requires developers to do something manually each time is just going to be a regular source of bugs an maintenance. I suppose I'll look into automating this with some scripts in the super project. – Paul Alexander Apr 29 '10 at 03:06
  • 3
    @Paul: honestly, you may have be right... that is until latest Git release 1.7.1. (http://www.kernel.org/pub/software/scm/git/docs/RelNotes-1.7.1.txt) `git diff` and `git status` both learned to take into account submodules states even if executed from the main project. You simply cannot miss submodule modification. – VonC Apr 29 '10 at 05:37
  • I like it how 1 month later, Paul accepted this as an actual answer! :-) – cregox Feb 02 '12 at 16:48
  • @Cawas: well, if you look at http://stackoverflow.com/questions/210155/what-is-a-swamp-diagram/210289#210289, David accepted my post as an actual answer... 3 years and 3 months(!) later. – VonC Feb 02 '12 at 17:05
  • @VonC but he wasn't disagreeing with you to begin with! – cregox Feb 02 '12 at 19:14
  • 1
    Until @PaulAlexander says something, I choose to believe he's actually using submodules now. – cregox Oct 25 '12 at 09:49
2

GitSlave allows you to manage several independent repos as one. Each repo can be manipulated by regular git commands, while gitslave allows you to additionally run a command over all repos.

super-repo
+- module-a-repo
+- module-b-repo

gits clone url-super-repo
gits commit -a -m "msg"

Repo-per-project has advantages with componentization and simplified builds with tools like Maven. Repo-per-project adds protection by limiting the scope of what the developer is changing - in terms of erroneous commits of garbage.

Andre
  • 390
  • 3
  • 8
  • Could you include a bit about the pros and cons of gitslave vs. git submodule? – M.M Nov 26 '15 at 04:58
  • 1
    The big advantage of Gitslave is it lets your Git repos stand alone. You can manage repos with plain git commands without affecting the gitslave relationship. But when you want to execute a tag, for example, across all repos then gitslave can do it. – Andre Nov 27 '15 at 17:45
  • 1
    Submodule, in my opinion, is fraught with complexity. Developers need to understand it and work with it intimately. – Andre Nov 27 '15 at 17:56