I use the following lines in the sand:
- Code which ultimately goes in different deployables goes in different folders in the same repository, under an umbrella project - what SBT calls a multi-project build (I use maven rather than SBT but the concepts are very similar). It will be built/deployed to different jars.
I try to consider the final deployables when making divisions that make sense. For example, if my system foosys has foosys-frontend
and foosys-backend
deployables, where foosys-frontend
does HTML templating and foosys-backend
talks to the database and the two communicate via a REST API, then I'll have those as separate projects, and a foosys-core
project for common code. foosys-core
isn't allowed to depend on the html templating library (because foosys-backend
doesn't want that), nor on the ORM library (because foosys-frontend
doesn't want that). But I don't worry about separating the code that works with the REST library from the "core domain objects", because both foosys-frontend
and foosys-backend
use the REST code.
Now supose I add a new foosys-reports
deployable, which accesses the database to do some reports. Then I'll probably create a foosys-database
project, depending on foosys-core
, to hold shared code used by both foosys-backend
and foosys-reports
. And since foosys-reports
doesn't use the REST library, I should probably also split out foosys-rest
from foosys-core
. So I end up with a foosys-core
library, two more library projects that depend on it (foosys-database
and foosys-rest
), and the three deployable projects (foosys-reports
depending on foosys-database
, foosys-frontend
depending on foosys-rest
, and foosys-backend
depending on both).
You'll notice that this means there's one code project for every combination of deployables where that code might be used. Code that goes in all three deployables goes in foosys-core
. Code that goes in just one deployable goes in that deployable's project. Code that goes in two of the three deployables goes in foosys-rest
or foosys-database
. If we wanted to have some code that was part of the foosys-frontend
and foosys-reports
deployables, but not the foosys-backend
deployable, we'd have to create another project for that code. In theory this means an exponential blowup in the number of projects as we add more deployables. In practice I've found it's not too problematic - most theoretically possible combinations don't actually make sense, so as long as we only create new projects when we actually have code to put in them it's ok. And if we end up with a couple of classes in foosys-core
that aren't actually used in every single deployable, it's not the end of the world.
Tests are best understood in this view as another kind of deployable. So I would have a separate foosys-test
project containing common code that was used for tests for all three deployable projects (depending on foosys-core
), and perhaps a foosys-database-test
project (depending on foosys-test
and foosys-database
) for test helper code (e.g. database integration test setup code) that was common between foosys-backend
and foosys-reports
. Ultimately we might end up with a full parallel hierarchy of -test
projects.
- Only move projects into separate git repositories (and, at the same time, separate overall builds) once they have different release lifecycles.
Code in different repositories is necessarily versioned independently, so in some sense this is a vacuous definition. But I think you should move on to separate git repositories only when you have to (analogy with this post: you should only use Hadoop when your data is too big to use anything friendlier). Once your code is in multiple git repositories, you have to manually update the dependencies between them (on a dev machine you can use -SNAPSHOT dependencies and IDE support to work as though the versions were still in sync, but you have to manually update this every time you resync with master, so it adds friction to development). Since you're doing releases and updating the dependency asynchronously, you have to adopt and enforce something like semantic versioning, so that people know when it's safe to update the dependency on foocorp-utils
and when it isn't. You have to publish changelogs, and have an early-warning CI build, and a more thorough code review process. All this is because the feedback cycle is a lot longer; if you break something in a downstream project, you won't know about this until they update their dependency on foocorp-utils
, months or even years later (yes, years - I have witnessed this, and in an 80-person startup, not a megacorp). So you need process to prevent that, and everything becomes correspondingly less agile.
Valid reasons to do this include:
- A full build of your project is taking too long, slowing down integration on the code you're working on - though try to speed it up first.
- Deploying all your deployables is taking too long - though again, try to automate this and speed it up. There's a real advantage from keeping everything in sync, you don't want to give it up until you absolutely have to.
- Separate teams need to work on the code. If you're not in constant communication with each other then you'll need the process overhead (semantic versioning etc.) anyway, so you may as well get the faster build times. (To be clear, I think every git repository should have a single team that owns and is responsible for it, and when teams split they should split repositories. I have further thoughts on release processes and responsibilities, but this answer is already pretty long).
I would use a team maven repository, probably Nexus. Actually I'd recommend this even before you get to the multi-project stage. It's very easy to run (just a Java app), and you can proxy your external dependencies through it, meaning you have a reliable source for your dependency jars and your builds will be reproducible even if one of your upstream dependencies disappears.
I intend to write up my ways of team working as a blog post, but in the meantime I'm happy to answer any further questions.