15

Is there a better proposal to version control web-projects with small random updates in several customer projects with git?

I want to use git to version control for web projects. The main difference to almost all other proposals are that this is a web project using HTML, JavaScript and some PHP files - no central libraries used by one ore more programs, like usual in typical Linux packages.

All my different web projects are for different customers basing on the same platform files, I would estimate 80% of the files are identical (call them platform) and 20% are modified for different customers to fit to their needs. Problem here is, that I don't know for which files we need a customer update - in detail every customer is different.

Best would be to keep the platform specific files in one directory and overlay these files with customer specific files in another directory. To solve this with git I found nothing really good so far:

  • git submodule (like proposed here) typically designed to have the sources of a vendor developed library close to the program who link it. Therefore the problem is that the platform and the customer files are in different directories, so I have to mix them during deployment to create the files for the web-server. Furthermore I have to keep the directory trees in sync manually and that would be a hell a lot of work with 10 directory deep hierarchies. In general a lot of postings grumble about the big administrative effort using submodules, it looks like it is overkill.
  • git subtree (like proposed here) seems to be simpler than submodule but suffers from the same problem with different directories, so I also need to keep the dir structure in sync and mix the files during deployment. Furthermore it is difficult to push platform changes back from customer repo.
  • GitSlave (like proposed here) I'm not sure whether this can be of benefit for me. It allows keeping several git repos in sync, maybe it helps syncing the dir structure of platform, but I can't believe it
  • Refactor between platform and customer files in different directories (like the result of this discussion) I think this is simply impossible in case of my customers and the technology used by web projects. For one customer this page need an update, for another that page. Even when introducing a PHP-framework the customer specific changes are spread over the whole tree.
  • Checkouts (like also proposed in this discussing in the last posting) This looks very simple and promising, with the drawback that all the customer specific files are outside of git (so outside of version control). Furthermore in case a file is updated in platform and in customer, the git pull fails - it aborts, so this is not usable
  • Vendor Branches (like recommenced here) as I have learned, branches are made to be merged back, and that is not aimed for my customer specific patches. These branches would be always open, only merged after an update from the platform (main) towards customer. And this will lead to a mega-lit repo keeping all customers and the platform information - not the git way of handling repos.
  • Mix during deployment. So a very pragmatic method of keeping the platform files in one repo and the customer files also in dedicated repos. During deployment of the files to the web-server, it can first write all platform files and than overwrite some of them by the platform specific files. The mixture happens very late in the web servers directory. This also have the drawback that the directory structure of each customer have to be manually kept in sync with the platform structure - otherwise the deployment would be too complex.

What is the best approach here?

Community
  • 1
  • 1
Achim
  • 442
  • 1
  • 3
  • 13
  • FWIW, thank you for at least taking the time to research some of the options instead of expecting everyone else to do all of your work. – Roman Aug 12 '12 at 03:12
  • Thanks sjas for the reformatting and small rephrasing, it's really better readable now! – Achim Aug 12 '12 at 07:33
  • Hi R0MANARMY, I've spent several days during several weeks meanwhile on this problem. I expected first finding a solution in the standard git books of documentation (I'm also a newbie on git) but realized that my use case seems to be different from gits main use case. – Achim Aug 12 '12 at 07:37

2 Answers2

5

TL;DR

This is actually an architectural design problem, not a source code management problem. Nevertheless, it's a common and interesting problem, so I'm offering some general advice on how to address your architectural issues.

Not Really a Git Problem

The problem isn't really Git here. The issue is that you haven't adequately differentiated what remains the same vs. what will change between customers. Once you've determined the correct design pattern, the appropriate source control model will become more obvious.

Consider this quote from Russ Olsen:

[Separate] the things that are likely to change from the things that are likely to stay the same. If you can identify which aspects of your system design are likely to change, you can isolate those bits from the more stable parts.

Olsen, Russ (2007-12-10). Design Patterns in Ruby (Kindle Locations 586-588). Pearson Education (USA). Kindle Edition.

Some Refactoring Suggestions

I don't know your application well enough to offer concrete advice, but in general web projects can benefit from a couple of different design patterns. The template, composite, or prototype patterns might all be applicable, but sometimes discussing patterns confuses the issue more than it helps.

In no particular order, here's what I would personally do:

  1. At the view layer, rely heavily on templates. Make heavy use of layouts, includes, or partials, so that you can more easily compose presentation-layer objects.
  2. Make heavy use of customer-specific configuration files (I rather like YAML for this purpose) to allow easier customization without modifying core code.
  3. At the model and controller layers, choose some appropriate structural patterns to allow your objects to behave polymorphically based on your customer-specific configuration files. Duck-typing is your friend here!
  4. Use some introspection based on hostname or domain, enabling polymorphic behavior for each client.

Next Steps with Git

Once you've refactored your application to minimize the changes between customers, you may find you don't even need to keep your code separate at all unless you're trying to hide polymorphic code from each client. If such is the case, you can certainly investigate submodules or separate branches at that point, but without the burden of heavy duplication between branches.

Symlinks are Your Friends, Too

Lastly, if you find that you can isolate changes into a few subdirectories, Git supports symlinks. You could simply have all your varied code in a per-client subdirectory on your development branch, and symlink the files into the right places on your per-client release branches. You can even automate this with some shell scripts or during automated deployments.

This keeps all your development code in one place for easy comparisons and refactoring (e.g. the development branch), but ensures that code that really does need to be different for each release is where it needs to be when you roll it out into production.

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
  • To explain a little bit more the aplication: I'm using a PHP framework based on MVC pattern. Inside the models there are basically no customer specific changes, they are pure platform. The controllers currently have the access rights hard coded for user groups (because it's a lot of work to make this flexible via UI). The views are always a little bit different, small formatting, color, and also different type of visualization. This could also affect the controller again handle different information. But if diving deeper into this we should open another topic... – Achim Aug 12 '12 at 07:52
  • 1
    I followed your advice CodeGnome, and came up with a script that will symlink all the files in a customization directory in the right places. This way, I will keep a source tree clean, with custom subdirectories inside the git tree and symbolic links to files of the current build, which should not alter my git status. This is [a gist with my code](https://gist.github.com/michelecos/8717b2c2a238a9dc1d97) – mico Feb 25 '16 at 08:23
2

Vendor branches make the most sense due to the nature of how you customize your solution for each vendor. The best way to go about it is to forgo this and develop a multi-tenant application.

Adam Dymitruk
  • 124,556
  • 26
  • 146
  • 141
  • The web-sites are hosted at small embedded controllers. There are several (dozens) of these controllers installed at each cutomer all with the same web-site - differing only in configuration and connected mechanics. – Achim Aug 11 '12 at 21:36
  • Another customer have different web-sites on his controllers, but I don't have a central huge server which is able to host a multi-tenant application... – Achim Aug 11 '12 at 21:37
  • You can use the same strategy and activate different features via license keys. – Adam Dymitruk Aug 12 '12 at 05:14
  • Assuming I supply 10 customers, every having 20% difference to the platform. So in sum there would cumulate to 200% of code addition to the platform which simply is to expensive to store it on every embedded controller. Furthermore I don't want to put the code of customer A on the controller of customer B, even if I don't activate it. – Achim Aug 12 '12 at 07:43
  • Then make an installer that allows certain things or behaviours turned off or on. – Adam Dymitruk Aug 13 '12 at 02:51