How can I make a hierarchy of repositories with Git ?

Question

I have a project with the following hierarchy :

Tharwa
|_tharwa-backend
|_tharwa-web
|_tharwa-mobile

Each is subfolder is a repository in its own; I want to create the repository Tharwa that puts everything together.

I have the following constraints however :

I don't want to put them in the same repo as subfolders because each have its own dependencies and configuration files, and also I don't want to have their commits mixed up.
I don't want to leave them as separate repos since I have issues on the parent repo that might require work to be done on ,say, both the back end and mobile repos,and i would like the have the issue solved on the same branch, for example :
```
__________________________ master
 \________________________ develop
       \______/ login
```

My question is, how can I make something like this possible ? where do I have it wrong?

and please let me know if I didn't explain myself well. Thank you in advance

git supports [so-called 'submodules', where](https://github.com/blog/2104-working-with-submodules) you can include references to other repositories in a repository. This reference can even point to a specific commit, allowing you to version the use of your submodules. Your main repo could go virtually without any code, just linking all the subprojects together. — GolezTrol, Feb 24 '18 at 18:26
@GolezTrol, I tried to mess around with submodules, but had some weird things with Detached head that i didn't understand, and seen people saying that submodules are dangerous. Do they support the branches thing ? If i create a branch on the main repo, will it be created in all submodules ? — Amine Birouk, Feb 24 '18 at 18:32
If you want a branch "applied" to all 3, you need to have them each as subdirectormixy. This goes against your wish of "no commits mix-up" but I think your 2 constraints are kind of incompatible, so you will need to choose between the two. For your first point, submodules could be a solution but indeed often not the recommended one as it creates confusion and complexities. — Patrick Mevzek, Feb 24 '18 at 18:34
@PatrickMevzek, how do people usually go about these types of projects? I'm fairly new to this, the workflow is still a bit blurry to me. — Amine Birouk, Feb 24 '18 at 18:38
I do not believe there is one true only way that fits all. It looks to me your constraint of "commits do not mix up" is out of fear of something but I do not know why and you seem to hint at work targeting the 3 projects at once. So I would just have one repository with 3 directories and that is all. Note that it also depends on how each part is released and deployed (together, separately, etc...). Also, easy way: just try to start with a case and see in a day to day life if it suits you. You can always change later! — Patrick Mevzek, Feb 24 '18 at 18:45

score 10 · Answer 1 · answered Feb 24 '18 at 19:10

There are basically 3 ways to see it:

one repository with 3 directories, one per project
one repository, probably almost empty, with 3 git submodules (so each one a repository by its own, but tied to the main one)
three completely separate repositories

I do not know from where your constraint "I don't want to have their commits mixed up." comes from, maybe just because you do not know git too much yet. Just note for now that you have powerfool tools and options in git to view commits, filter by date, author, path, content, etc. So in my opinion this is nothing to worry about. And on the contrary this allows you to clearly show, with an unique commit, that file X in first project has changed at the same time as file Y in second project (for example because you changed an API so you need to change at the same time the producer and the consumer of the API, and that should be reflected in only one commit).

But if you want strict commit isolation, you have it in option 3, and also in option 2. Not in option 1: there, one commit could cover changes in any of the subprojects.

As for your second constraint, it is immediately possible in option 1, kind of possible in 2, but certainly not in 3.

git submodules deserve a discussion by themselves as they come with their own constraints. Make sure to read and learn about them before using them in a large scale. Here are some interesting links for them, besides the official documentation (first link)

https://git-scm.com/book/en/v2/Git-Tools-Submodules
https://github.com/blog/2104-working-with-submodules (see Advice on using submodules or not, at end)
https://medium.com/@porteneuve/mastering-git-submodules-34c65e940407 (discussion on submodules vs subtrees, basically option 2 vs 1)
https://www.atlassian.com/blog/git/git-submodules-workflows-tips (some tips on workflows with submodules)
https://www.atlassian.com/blog/git/git-submodules-workflows-tips (submodules and branches)
https://codingkilledthecat.wordpress.com/2012/04/28/why-your-company-shouldnt-use-git-submodules/ (some arguments against them)

As for your specific question on submodules and branches, have a look at this question and its answers: Git submodules: Specify a branch/tag

Like I wrote in my comment, things do also depend on how you package and distribute this software. Is it always one piece of code deployed as is (option 1 and 2 would make more sense), or can you release only one project separately from the others (option 3 would make more sense). Note that I said "make more sense", because it is not black and white, you can always achieve your goals in any option, the compromises are just different.

It depends also on the fleet of developers that will work on these. What are their knowledge level with git? submodules is not something I would recommend to git beginners. And how commits are pushed/pulled between remote repositories? In option 1 you have one repository to cater for, in option 2 also but you need to updates the submodules (see documentation) and in option 3 you have 3 separate repositories to handle.

There may also be other side points to take into accounts, but they may be irrelevant if you start with empty content. Like sizes. Some repositories can include sometimes a lot of history, and this impacts git clone for example (so in this case having separate repositories if one is big, this does not impact the others).

You seem to hint at a workflow as described on http://nvie.com/posts/a-successful-git-branching-model/ which is a good start. If you want to stick to it, it will be easy in option 1, mostly possible but not exactmy in 2, and not possible in 3. (and have a look at https://www.atlassian.com/git/tutorials/comparing-workflows for some other possible workflows)

It really seems to me your 2 constraints are going in opposite directions, so you would need to see which one is more important than the other.

As for myself, but without the whole picture you have, I would favor option 1 as it seems the most flexible one (and from it you can easily switch to option 2 or 3 later).

Lol! - "powerfool tools". Was that deliberate? Git is so powerful and sometime it makes me feel like a fool - so it definitely resonates with me. — Craig Hicks, Sep 11 '21 at 18:51
@CraigHicks That is an honest mistake (English is not my primary language) and hence not deliberate at all. But now it does sound funny to me, so I will let it like that. Thanks for your comment! — Patrick Mevzek, Sep 11 '21 at 18:53

Schwern · Answer 2 · 2018-02-24T19:49:40.677

I don't want to put them in the same repo as subfolders because each have its own dependencies and configuration files, and also I don't want to have their commits mixed up.

Ok, you don't want their commits mixed up.

I don't want to leave them as separate repos since I have issues on the parent repo that might require work to be done on ,say, both the back end and mobile repos,and i would like the have the issue solved on the same branch...

...except when you DO want their commits mixed up. ;)

Where do I have it wrong?

If you have to habitually change multiple repositories simultaneously, you may want to consider whether they're actually a single repository. There's two good ways to handle this, subrepositories are not one of them.

One repo

One is to make them a single repository. If they're all pieces of the same project, and they have changes which depend on each other, they're a single repository. It's ok for them to be subfolders with their own configuration and dependencies, this is fairly common for large projects that need to be developed together, but split for distribution.

The downside is developers are likely to take advantage of this and tightly bind the client code to the backend. Without clear separations between the projects the backend API is likely to get sloppy. The clients are more likely to take advantage of undocumented backend features making the whole system brittle and resistant to change. Adding a new client, like maybe tharwa-api, will become more difficult.

If you have 3rd parties writing their own clients for the tharwa-backend, they're at a disadvantage. client and web are in a privileged position, they can be in lock-step with backend. 3rd party developers aren't so lucky, and your project will be harder to contribute to.

And once you wield your projects together, you're not likely to ever pull them apart again.

Many repos, strict dependencies.

The other is to more strictly enforce your encapsulation between the pieces by each repo treating the other as normal dependencies. In your login example...

Implement, test, and commit the change on backend.
Release backend, even if only for internal distribution.
Test web and mobile against the new backend to ensure backwards compatibility is maintained.
Some dependency mechanisms allow drawing dependencies directly from a Git repo.
Have web and mobile update their backend dependency and use the new feature.

Now it harder for developers to cheat. The extra step of a release (which shouldn't take more than a minute or two) provides an "air gap". backend has to develop its own unit, integration, and acceptance tests; it can't rely on the clients to do it for them. The clients have to be more robust and adhere more strictly to the backend API. With the backend and client decoupled, it will be easier to make radical changes to the internals of each.

Developers can still make lockstep changes, but they're now explicit. Making them explicit discourages their use, it prevents devs from getting lazy.

But it does add some more overhead. backend changes must be fully thought through, developed, and documented. The backend API must be more fully developed and robust. The clients must adhere more closely to the API. All this is good software engineering and will speed things up in the mid and long-term.

Why not submodules?

Submodules provide most of the upsides of a single repo, but adding a confusing feature. It also provides all of the downsides of a single repo, plus one more: a lack of coordination.

With a single repo, one commit is one commit. One branch is one branch. With submodules is it's difficult to know by looking at a single repository which commits must be coordinated between all repositories. These coordinated commits can happen at any time, without warning, and it's difficult to know.

You'll want some procedures and mechanisms to track and coordinate these commits. You could build this all yourself through trial and error, maybe something with tags or special commit messages.

Or you could use an existing release dependency system.

Which you choose depends on your project. However I'd recommend you try the full decoupling and see how it goes. It encourages good software engineering practices. And you can always put them back together later, it's difficult to go the other way around.

How can I make a hierarchy of repositories with Git ?

2 Answers2

One repo

Many repos, strict dependencies.

Why not submodules?