Setting up a complex stage environment with open source tools

Question

I've searched the series of tubes for awhile and haven't found any good answers to this question usually due to a lack of understanding on the part of the question readers of what the use cases are so I am going to be excruciatingly detailed. For example this question: Create a "label" in subversion indicating what files should be in the next release (the 5 vote answer seems close, but not quite there) and this question: Using Subversion Tags to Deploy to Development/Staging/Testing Server are similar to mine except the people attempting to answer don't seem to fully understand the subtleties.

I'm setting up a more sophisticated staging environment for a fast growing project. The current environment consists of a main production branch along with build branches. Builds are not a problem as we can just tag the head revision of the build branch when it's "done" and merge that back into production. The piece that is more subtle is being able to setup an automated process which tags an arbitrary set of files at an arbitrary revision for each file with a label so that you can sync to that label out to staging servers. Now in the SCM world label is overloaded so I will explicitly state that in this case I am using label in the perforce (http://www.perforce.com/perforce/doc.current/manuals/cmdref/label.html) sense of the word meaning a name for a set of files where the files are at arbitrarily chosen revisions and the set is mutable.

So to give a simple example: Suppose I have file A and B. File A's head revision is revision 13 of that file and file B's head revision is revision 4. Currently in production we have A@10 and B@2. The changes to file A have been QAd and has been determined that it's ready to patch out. The first change to file B (revision 3) is ready to patch but the business side has determined that the head change (revision 4) needs to be worked on a little more so it shouldn't patch and gets pushed out to a later date. So for patching to production we need to tag B@3 and A@13 for release. So this is where everyone says "ohh well use tags in ". So that's all well and good for the 20110703 nightly patch. But in the stage environment we also want to be able to test the not-necessarily-head-revision files in the state that would best be described as "if we were to tag the branch right now this is what it would look like" state throughout the day (week, month, etc). Don't get me wrong I don't want to do a bunch of coding in the production branch, but sometimes it's necessary.

The one point I've glossed over so far is that there is also a ticketing system where commits are associated with tasks/tickets and the tasks/tickets are related to a particular release date. So the workflow is that a user creates a task, attaches code to it in the form of changesets (with a one-task-to-many-changesets relationship), the task progresses through an approval process, and eventually is christened as ready to patch. Then there are a series of automated scripts to determine what file revisions are going to patch out on day X and sync the staging environments (or production) to the appropriate versions of files. The specific script that im having trouble with is the one where given a set of tasks, and through them their changesets, that are ready to patch we are able to sync a set of files to the appropriate revisions to emulate what that future production patch will look like. If I were to use perforce I could accomplish this with labels that are mutable and basically just hold a collection of (filename, filerevision) values that a user can sync to. But I'm looking to use open source tools and specifically tools that integrate with Redmine (yes I still need to build the ticket-to-changeset association layer).

So my questions are.

Are there any open-source SCMs that have the concept of a label in them? I've looked a little at mercurial and the queues extension but it, once again, seems to solve a similar but not quite the same problem. (feel free to correct me and say "nope queues solve this perfectly just to this...")
If there aren't any tools that work exactly this way, any suggestions for how to best set this up? I can certainly write a script that kinda fakes labels and manually syncs each individual file but that seems bad in so many ways.

Basically what I'm looking to do is be able to allow actions like a task being moved out a few days or transitioning from a non-patchable to patchable state to be able to affect the state of the code on the staging servers to put them in the "this is what we're planning on patching" state without any human intervention after the task changes.

Thanks for the help.

I suspect the reason that you haven't quite found what you're looking for is that modern revision control systems work on the *commit* level rather than the *file* level. The commit level is different because (a) a commit can span multiple files, and (b) a given file has multiple commits that may affect it. Handling release staging on a file by file basis seems like it would be intractable unless almost all your commits are completely disjoint (which doesn't seem likely in practice). Have you considered any other staging schemes? — Greg Hewgill, Jul 03 '11 at 08:26
So the commit-level granularity is fine since that is actually what is being associated with tasks and therefore being designated as ready for release. But if I commit system-wide revision 6 with files A, B, C and system-wide revision 7 with files D, E, F and want to patch revision 6 without 7 (or 7 without 6) I should be able to do that in an automated way and without the label concept I reference it seems difficult. The only trouble, and in practice time where manual work is needed, is if 6 and 7 contain files A, B, C | C, D, E respectively. Then a human typically fixes the file C issue. — umassthrower, Jul 03 '11 at 08:31
The situation where two revisions have a disjoint set of affected files (A, B, C and D, E, F) is trivial to manage with a DVCS like Git or Mercurial (I use Git so I'm most familiar with that). Also, an overlapping set of files (A, B, C and C, D, E) only requires manual intervention if the actual changes in file C overlap between each revision. If they don't overlap, there's no problem and the selection can be handled automatically. — Greg Hewgill, Jul 03 '11 at 08:39
So really "commit-level" vs. "file level" is unrelated unless what you're suggestion is that SCM architects chose to ignore the face that these are files we are patching out and assume that any change that is not going to the main branch is immediately rolled back before patching production in order to enable the simplification of only ever needing to patch head. To which I would suggest that is an unnecessary and unfortunate decision. — umassthrower, Jul 03 '11 at 08:40
It might be worthwhile for you to take a step back and have a look at something like [A Successful Git Branching Model](http://nvie.com/posts/a-successful-git-branching-model/), which describes one way to manage staged releases with Git. — Greg Hewgill, Jul 03 '11 at 08:40
Which is all well and good, but given a set of revisions (file level or entire repo level) that are ready to patch how do I automate the process of updating the source code on a stage server to match that set of revisions? That's the question here. Not whether I can take 2 heads in hg and merge them. — umassthrower, Jul 03 '11 at 08:43
What I would do is (in Git terms), prepare a branch for staging by merging and cherry-picking as necessary, on my own workstation. Then I would push that branch to a repository somewhere, and then pull it down to the staging server. Once it's tested and is ready for production, go to the production server and pull the *same* branch. Due to the flexibility of modern DVCS, there are many different ways to arrange this and you might find a different way that works for you. — Greg Hewgill, Jul 03 '11 at 08:46
Good link, +1 for that. So in this scenario, which is not too far off from what I'm suggesting, what happens when there are overlapping changes in the hotfix line and one gets punted? The suggestion here seems to just be "don't do that" or "revert the later change and patch the head then reapply the change later". Still not easy to automate on the stage server other than saying always sync to the head of the hotfix branch. — umassthrower, Jul 03 '11 at 08:49
OK so I think the process would look like this: 1. grab the set of changesets that are not in a ready-for-patch state 2. grab the head revision of the master branch 3. back out each changeset that is not ready for patch 4. push that to the staging repo 5. export that to the staging server ...sound about right? — umassthrower, Jul 03 '11 at 09:00
In the event that you have two distinct commits that depend on one another and overlap, and you want to select only the second one (for example), then you have to make a project-specific decision about how to deal with that. The answer will always be different and will almost always need human input. In practice though, if commit B depends on commit A, and commit A is not ready, then you're most likely going to delay implementing commit B too. — Greg Hewgill, Jul 03 '11 at 09:01
No, I get the 2 commits that depend on one another, but that question about the hotfix line wasn't related to commits that depend on eachother, just ones that were out of order. Given your experience with the newer systems does the algorithm I mapped out in the other comment involving "unapplying" commits make sense? — umassthrower, Jul 03 '11 at 09:05
No, I'm afraid it doesn't make sense. You're referring to the statement "any change that is not going to the main branch is immediately rolled back before patching production in order to enable the simplification of only ever needing to patch head", right? I don't get that. The [Pro Git](http://progit.org) book is available for free and may help clarify the concepts involved here. — Greg Hewgill, Jul 03 '11 at 09:08
no not that one, the one that says "OK so I think the process would look like this: 1. grab the set of changesets that are not in a ready-for-patch state 2. grab the head revision of the master branch 3. back out each changeset that is not ready for patch 4. push that to the staging repo 5. export that to the staging server" — umassthrower, Jul 03 '11 at 09:10
Oh, that. I would: 1. Grab the previous release branch; 2. Pull each branch or cherry-pick each commit that *is* intended to be in the next staging test; 3. Push to staging repo; 4. Pull that into the staging server. While you *could* do it by backing out the changes that you didn't want, that seems like a bit more work than necessary. If there aren't any changes you don't want (hopefully the common case), a simple merge with an appropriate branch would grab all commits. — Greg Hewgill, Jul 03 '11 at 09:16
OK I think that's probably a workable solution. I'm going to bed, but if you want to submit that as an answer feel free and I'll check this question tomorrow. — umassthrower, Jul 03 '11 at 09:19

score 2 · Accepted Answer · answered Jul 03 '11 at 09:45

[This answer attempts to summarise the discussion in the comments above.]

Modern DVCS (eg. Git, Mercurial) manage changes as sequences of commits, rather than sets of files. Because if this different paradigm, it is difficult to think of a "label" that selects particular files from particular revisions. Commits may touch several files, and commits can both touch a given file (though the change inside the file may or may not overlap).

To manage a staged release using Git, what you could do is:

Grab the previous release branch.
Pull each branch or cherry-pick each commit that is intended to be in the next staging test (you can do this on a development workstation).
Push to staging repository.
Pull that into the staging server.
When the staging checks out okay, pull that same branch into production.

In the (hopefully) common case where there aren't any changes you don't want, then step 2 becomes a single-step merge. If you don't want a particular change, then you can cherry-pick the changes you do want.

Some helpful resources:

Pro Git - free book on Git
A successful Git branching model - a description (with pictures) of one way to manage multiple changes and staging releases with Git

To supplement this a bit the scripted & staged version will basically implement this algorithm and include language bindings, determining which commits should be applied, and reasonable handling of conflicts. Thanks Greg. — umassthrower, Jul 03 '11 at 15:42

Setting up a complex stage environment with open source tools

1 Answers1