20

There have been a couple of questions about Hg sub-repo dependencies in the past (here and here) but the accepted answers don't seem to address the problem for me.

A project of mine has 4 dependencies: A, B, C, D. D is dependent on A, B and C; and B and C are dependent on A:

dependency graph of A,B,C,D

I want to use Hg sub-repositories to store them so I can track what version of each they rely on. This is because, while I am using A,B,C and D in this project, other projects will require just A and B. Therefore B and C must track what version of A they need independently of D. At the same time, in my application the versions of B and C referenced by a given version of D must always use the same version of A as that referenced by the given version of D (otherwise it will just fall over at runtime). What I really want is to allow them to reference each other as siblings in the same directory - i.e. D's .hgsub would look like the following, and B and C's would look like the first line.

..\A = https:(central kiln repo)\A
..\B = https:(central kiln repo)\B
..\C = https:(central kiln repo)\C

However this doesn't seem to work: I can see why (it'd be easy to give people enough rope to hang themselves with) but its a shame as I think its the neatest solution to my dependencies. I've read a few suggested solutions which I'll quickly outline and why they don't work for me:

  1. Include copies as nested sub-directories, reference these as Hg sub-repositories. This yields the following directory structure (I've removed the primary copies of A, B, C, B\A, C\A as I can accept referencing the copies inside \D instead):

    • project\ (all main project files)
    • project\D
    • project\D\A
    • project\D\B
    • project\D\B\A
    • project\D\C
    • project\D\C\A

    Problems with this approach:

    • I now have 3 copies of A on disk, all of which could have independent modifications which must be synced and merged before pushing to a central repo.
    • I have to use other mechanisms to ensure that B, C and D are referencing the same version of A (e.g. D could use v1 while D\B could use v2)
  2. A variation: use the above but specify the RHS of the .hgsub to point to a copy in the parent copy (i.e. B and C should have the .hgsub below):

    A = ..\A
    

    Problems with this approach:

    • I still have three copies on disk
    • The first time I clone B or C it will attempt to recursively pull the referenced version of A from "..\A", which may not exist, presumably causing an error. If it doesn't exist it gives no clue as to where the repo should be found.
    • When I do a recursive push of changes, the changes in D\B\A do not go into the shared central repo; they just get pushed to D\A instead. So if I push twice in a row I can guarantee that all changes will have propagated correctly, but this is quite a fudge.
    • Similarly if I do a (manual) recursive pull, I have to get the order right to get the latest changes (i.e. pull D\A before I pull D\B\A)
  3. Use symlinks to point folder \D\B\A to D\A etc.

    Problems with this approach:

    • symlinks cannot be encoded in the Hg repo itself so every time a team member clones the repo, they have to manually/with a script re-create the symlinks. This may be acceptable but I'd prefer a better solution. Also (personal preference) I find symlinks highly unintuitive.

Are these the best available solutions? Is there a good reason why my initial .hgsub (see top) is a pipe-dream, or is there a way I can request/implement this change?

UPDATED to better explain the wider usage of A,B,C,D

Community
  • 1
  • 1
Tom Carver
  • 962
  • 7
  • 17

2 Answers2

5

Instead of trying to manage your dependencies via Mercurial (or with any SCM for that matter), try using a dependency management tool instead, such as Apache Ivy.

Using an Ivy based approach, you don't have any sub-repos, you would just have projects A, B, C and D. A produces an artifact (e.g. a .jar, .so or .dll, etc), which is published into an artifact repository (basically a place where you keep your build artefacts) with a version. Projects B and C can then depend on a specific version of A (controlled via a ivy.xml file in each project) which Ivy will retrieve from the artifact repository. Projects B and C also produce artefacts that are published to your repository. Project D depends on B and C and Ivy can be told to retrieve the dependencies transitively, which means it will get the artifacts for B, C and A (because they depend on A).

A similar approach can be used with Apache Maven and Gradle (the later uses Ivy)

The main advantages are that:

  • it makes it very clear what versions of each component a project is using (sometimes people forget to check .hgsub, so they don't know they are working with subrepos),
  • it makes it impossible to change a dependant project (as you are working with artifacts, not code)
  • and it saves you from having to rebuild dependent projects and being unsure of what version you are using.
  • saves you from having multiple redundant copies of projects that are used by other projects.

EDIT: Similar answer with a slightly different spin at Best Practices for Project Feature Sub-Modules with Mercurial and Eclipse?

Community
  • 1
  • 1
Tom Howard
  • 6,516
  • 35
  • 58
  • +1 for distinguishing built artifacts from source. Trying to manage dependencies through SCM falls apart as soon as you trip over something that the SCM cannot control. – Devon_C_Miller Oct 19 '11 at 19:53
  • This seems to me like the best approach, and the approach I would take if I were starting something from scratch. However our organization has some fairly poor separation of concerns between dependent projects, as well as some poorly defined interfaces to framework-type projects. This leads to lots of hopping between dependent to dependee to address a single requirement, and also to frequent refactoring (aided by Resharper). For these reasons, it has suited us better to use subrepos than, say, Nuget packages, for managing in-house dependencies. – mo. Dec 17 '12 at 21:30
2

You say you want to track which version they each rely on but you'd also be happy with a single copy of A shared between B, C and D. These are mutually exclusive - with a single copy of A, any change to A will cause a change in the .hgsub of each of B, C and D, so there is no independence in the versioning (as all of B, C and D will commit after a change to A).

Having separate copies will be awkward too. If you make a change that affects both B's copy of A and C's copy then attempt to push the whole structure, the changes to (say) B will succeed but the changes to C will fail because they require merging with the changes you just pushed from B, to avoid creating new heads. And that will be a pain.

The way I would do this (and maybe there are better ways) would be to create a D repo with subrepos of A, B and C. Each of B and C would have some untracked A-location file (which you're prompted to enter via a post-clone hook), telling your build system where to look for its A repository. This has the advantage of working but you lose the convenience of a system which tracks concurrent versions of {B, C} and A. Again, you could do this manually with an A-version file in each of B or C updated by a hook, read from by a hook, and you could make that work, but I don't think it's possible using the subrepos implementation in hg. My suggestions really boil down to implementing a simplified subrepo system of your own.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
hcarver
  • 7,126
  • 4
  • 41
  • 67
  • I've updated my question to better explain - B can (and will) be used independently of D, and so must track a version of A. Just because **in my application** I want B,C and D to use the same version of A (at a given point in time), doesn't mean that is the only way they will be used (otherwise I'd just give up on sub-repos and include the code directly). In the general case, they must be able to be independent; but in this project they must all use the same version of A. – Tom Carver Apr 13 '11 at 08:59
  • That was how I understood the description originally, but if you want to use B and C in these two different ways, I don't think you can include the location of A in the repositories and still expect it to work. Have you considered having separate B2 and C2 repositories which have sub-repos of {B,C} and A, used for development (and hence tracking consistent versions of A)? Note that D would have subrepos of B and C still, not B2 and C2. Because what you're doing isn't directly possible with hg subrepos (as they stand), you're going to have to do something hacky at some point. – hcarver Apr 13 '11 at 11:41