17

We are developing an open source project, and we are using Mercurial for source management control. The Mercurial repository for this project is public (we are using Bitbucket).

Now we have a client for whom we need to customize our open source software. These customizations must be kept private, so we probably need to create a new Hg repository for this client; this new repository would be private.

But the problem is we would need to [from time to time] merge changes (such as new features or bug fixes) from the open repository into our private repository.

What is best way to achieve this? I read that it is possible to merge two or more Mercurial repositories, but the history will be lost. Also merging could be painful because of many conflicts. What if we get a few more clients in future, how we should manage their repositories? Should we use one repository and multiple branches? What if the two project versions start to head in different directions, and the two repositories become increasingly different?

Please share your experience about this.

Thanks in advance!

Buttons840
  • 9,239
  • 15
  • 58
  • 85
Ivica
  • 795
  • 2
  • 8
  • 20
  • I wonder how the simple "two branch" approach suggested in Martin Geisler and Laurens Holst answers compares with the "three branch" approach explained in this question: http://stackoverflow.com/questions/6020936/whats-a-good-way-to-organize-projects-with-shared-dependencies-in-mercurial I posted a bounty hoping for a more detailed comparison of these (and other) techniques. – Buttons840 Jan 04 '13 at 06:41
  • @Buttons840 - I can't find "ThreeBranches" in linked topic. Am I blind? – Lazy Badger Jan 10 '13 at 10:01

4 Answers4

14

What you describe is a standard thing with a distributed version control system: developing in two repositories and keeping one a subset of the other. Start by making a clone for the private development:

hg clone open private

Then go into private and make the new features there. Commit as normal. The private repository will now contain more changesets than the open repository -- namely the new features.

When bugfixes and new features are put into the open repository as part of the normal open source process, then you pull them into the private repository:

cd private
hg pull
hg merge

That way you keep the invariant: the private repository always contains everything in the open version, plus the private enhancements. If you're working on the private version and discover a bug, then remember to take a look at the open version to see if the bug exist there too. If so, then fix it in the open version first and merge the bugfix into the private version. If you fix a bug in the private version by mistake, then use hg transplant to copy the bugfix over to the other open version.

There wont be any loss of history. You will have to resolve the merge like normal when you do hg merge and the conflicts will only be as large as required by your private changes.

The important thing to remember is to never push (or pull) the other way, unless you want to begin releasing some of the private changes into the open source version.

You can use this setup several times with different clients and you can also push/pull changesets between different private repositories as needed if several clients require the same private enhancement.

Martin Geisler
  • 72,968
  • 25
  • 171
  • 229
  • Just INHO (**very** humble...): "branch per task" (in Open, and if needed, in Private also) and "Push Branch" delivery is slightly more manageable process – Lazy Badger Jan 10 '13 at 10:00
7

Well in principle the basic model is relatively simple; have a separate private repository which is a clone (branch) of the public one, make all private changes on there, and then regularly merge the public one into the private one. There are no problems in regard to history preservation, I don’t know why you read that would happen.

However the challenge is to not end up with an unmaintainable merge hell, and this can only be achieved through strict discipline.

The most basic rules of thumb for any long-lived branches are:

  1. Keep the private branch as small as possible. Minimise the amount of changes in there, and keep them small so don’t start refactoring huge parts of code or change indentation. In a one-way merge situation like here, any code that you modify has the potential to conflict, even way down the line.

  2. Merge frequently. The more frequent the better. If you don’t do this, ever time you do want to integrate the changes from the public repository you will end up with one super-merge that has a ton of conflicts.

Additionally, you should also be disciplined in organising and write your code to facilitate this scenario. Have clear rules about what goes where on which branch, and sectioning off the pieces of code.

Ideally you would model the customised functionality as a plug-in or external library, a separate project even. That may not always be easily achievable, in that case at least try to write all private modifications in terms of sub-classes of the original which you instantiate with factory methods. By making all your changes in independent files that only exist on the private branch, you minimise the risk for conflicts.

Also write automated tests. Lots of them. Else you won’t promptly detect merge problems (which will happen), and the private branch will often be broken.

Finally a tip: make a push hook on the public repository that denies any push containing a changeset that you know is private; this will prevent accidental publication of the private code and potentially save you a lot of headaches.

Laurens Holst
  • 20,156
  • 2
  • 29
  • 33
  • Wrt. your last question about multiple repositories or branches, essentially in Mercurial multiple cloned repositories are not really different from any other type of branch, I think this was originally even the recommended branching model. So use whatever suits you best. Seeing as you already are having multiple repositories, I guess doing this for each customer makes sense. – Laurens Holst Oct 07 '11 at 12:20
1

Well, some extensions and variations.

  • For Martin's workflow you can use "Branch Per Task" paradigm (branches have to be created in base project, "Open") and "push -b --new-branch" (publish only branch, not the whole set of changes also in mainline) to "Private", in which branch also must be merged to default.

Increased amount of forks in this case costs "+2 commands +1 repository" per fork

  • Variation of branching: single developer's repo, many named branches (Branch per Target + Branch per Task). Small deviation of v.1 - only one development repo, which contain Open&Private named branches (among other short-termb branches). Task also (as in p.1) implemented in separate branch, which (without push) merged to needed targets (Open and Private). User-visible Open and Private repos have to be updated also with push -b

Increased amount of forks in this case costs "+1 commands +1 branch" per fork

  • Patch-based model. Single common code, all changes performed on top of "vanilla Open" codebase changesets. With enabled MQ only applying patches in queue convert Open to Private. If case "exist in Open, must not exist in Private" we'll go to situation of 3-level versioning Core-Open-Private. For such situation different patch-sets on top of Core have to be used. "Different patch-sets" can be a) differently named branches for different targets and manual apply and control b) using guards c) for rather fresh Mercurial it's possible separate queues and for each unique target have also unique queue. Tasks can be developed in branches, as before, or in MQ-patch (qfinish'ed later or not)

Increased amount of forks in this case costs "+1 patch" per fork and, maybe "+1 queue" (see above). I'll prefer single queue with guards for simplicity and manageability

Lazy Badger
  • 94,711
  • 9
  • 78
  • 110
1

Project as usual consists of a set of modules. In my experience sometime even better to have some modules in separate source-controls repositories. For example, some utility-module or core-module like web-framework or DAO (ORM) module. In this case you able to use branches in source-controls as they should be used - to support trunk-development and support of each released version in the same source-controls repository to have ability to merge branches.

So I propose you re-design structure of your application modules in such way that allows you to separate core (open-source) functions from commercial (customer dependent) customization. So to manage open-source & commercial releases you need to have a separate assembly procedures - they can be more-or-less similar or even commercial release can use an open-source release as set of complete artifacts and extends they.

In fact that is very interest task - I've spent a lot of time on it last year. My decision is to have one core-repositoy (open-source) with fully functioned maven task to release it. And a separate repo for each customer that keeps only design customization & some customer-specific business-logic (just use aliases in customer's spring XML to override your "core" Spring services - see BeanDefinitionOverriding) and the maven-task for my customer is based on usage of core-artifacts (often extends some of them - see for example "overlays" in maven-war-plugin that allows to extend existed WAR). Dealing in such way you will never have a clone of the same class in another branch - you will use it or extends it exactly like you use log4j classes in your application. You should just extends the open-source release.

Another interest task is how to manage config-files. I recommend you to see on Maven Remote Resources Plugin instead of default Maven Resources Plugin. It allows you o have a template of configuration files and move all values to maven profiles that should be specific for each customer. And see on Maven Tiles Plugin - it helps me to dramatically simplify "pom.xml" in customer's project (I can re-use "tiles" of maven build & assembly procedure)

s13o
  • 116
  • 1
  • 5
  • +1 Although this answer is slightly biased towards Java, it is the only one that addresses not the technicalities of the VCS, but the application architecture explicitly designed to allow to reduce as much as possible this kind of problems. – marco.m Sep 29 '16 at 17:49