24

I can't seem to grok the different solutions I've found and studied for tracking external code. Let alone understand how to apply them to my use case...

Would you guys be so kind to shed some light on this and help me with my specific use case? What would be the best solution for the following, concrete problem? (I'm not gonna attempt to generalize my problem, since I might make wrong assumptions about stuff, especially since I'm so new with all this...)

I'm building a website in Django (a web framework in Python). Now, there are a lot of 3rd party plugins available for use with Django (Django calls them 'apps'), that you can drop in your project. Some of these apps might require a bit of modification to get working like I want them. But if you start making modifications to 3rd party code you introduce the problem of updating that code when newer versions appear AND at the same time keeping your local modifications.

So, the way I would do that in Subversion is by using vendor branches. My repository layout would look like this:

/trunk
  ...
  /apps
    /blog-app
  ...
/tags
  ...
/branches
  ...
/vendor
  /django-apps
    /blog-app
      /1.2
      /1.3
      /current
    /other-app
      /3.2
      /current

In this case /trunk/apps/blog-app would have been svn copy'd of one of the tags in /vendor/django-apps/blog-app. Say that it was v1.2. And that I now want to upgrade my version in trunk to v1.3. As you can see, I have already updated /vendor/django-apps/blog-app/current (using svn_load_dirs) and 'tagged' (svn copy) it as /vendor/django-apps/blog-app/1.3. Now I can update /trunk/apps/blog-app by svn merge'ing the changes between /vendor/django-apps/blog-app/1.2 and /vendor/django-apps/blog-app/1.3 on /trunk/apps/blog-app. This will keep my local changes. (for people unknown with this process, it is described in the Subversion handbook: http://svnbook.red-bean.com/en/1.5/svn.advanced.vendorbr.html)

Now I want to do this whole process in Git. How can I do this?

Let me re-iterate the requirements:

  • I must be able to place the external code in an arbitrary position in the tree
  • I must be able to modify the external code and keep (commit) these modifications in my Git repos
  • I must be able to easily update the external code, should a new version be released, whilst keeping my changes

Extra (for bonus points ;-) ):

  • Preferably I want to do this without something like svn_load_dirs. I think it should be possible to track the apps and their updates straight from their repository (most 3rd party Django apps are kept in Subversion). Giving me the added benefit of being able to view individual commit messages between releases. And fixing merge conflicts more easily since I can deal with a lot of small commits instead of the one artificial commit created by svn_load_dirs. I think one would do this with svn:externals in Subversion, but I have never worked with that before...

A solution where a combination of both methods could be used would be even more preferable, since there might be app developers who don't use source control or don't make their repos available publicly. (Meaning both svn_load_dirs-like behavior and tracking straight from a Subversion reposity (or another Git))

I think I would either have to use subtrees, submodules, rebase, branches, ... or a combination of those, but smack down me if I know which one(s) or how do to it :S

I'm eagerly awaiting your responses! Please be as verbose as possible when replying, since I already had a hard time understanding other examples found online.

Thanks in advance

Assaf Lavie
  • 73,079
  • 34
  • 148
  • 203
hopla
  • 3,322
  • 4
  • 28
  • 26
  • Have you got your "aha"-moment in the meantime? I'm also looking for this and I'm still a little bit unsure how to do it. – acme Dec 09 '10 at 08:27
  • Well, I've certainly learned a lot more about Git since I wrote this question and I also have a better grasp of adding remotes and merging in branches/tags from those remotes. In fact I'm using something like it now at work too keep our Drupal install up to date. – hopla Dec 17 '10 at 14:11
  • As far as submodules are concerned: I haven't really tried them yet, but I get the gist of it now. It still seems a very complicated and error-prone system however. So how do I do my Django installs now? Well, I had a little 'aha' moment in Django too and I've learned a lot of legit methods that can be used to extend apps. And so I just install external apps in virtualenv with pip, which is 'the right way to go' anyway. So I kinda avoided the problem and also got a much better solution in place :) – hopla Dec 17 '10 at 14:16

4 Answers4

27

There are two separate problems here:

  1. How do you maintain local forks of remote projects, and
  2. How do you keep a copy of remote projects in your own tree?

Problem 1 is pretty easy by itself. Just do something like:

git clone git://example.com/foo.git
cd foo
git remote add upstream git://example.com/foo.git
git remote rm origin
git remote add origin ssh://.../my-forked-foo.git
git push origin

You can then work on your forked repository normally. When you want to merge in upstream changes, run:

git pull upstream master

As for problem 2, one option is to use submodules. For this, cd into your main project, and run:

git submodule add ssh://.../my-forked-foo.git local/path/for/foo

If I use git submodules, what do I need to know?

You may find git submodules to be a little bit tricky at times. Here are some things to keep in mind:

  1. Always commit the submodule before committing the parent.
  2. Always push the submodule before pushing the parent.
  3. Make sure that the submodule's HEAD points to a branch before committing to it. (If you're a bash user, I recommend using git-completion to put the current branch name in your prompt.)
  4. Always run 'git submodule update' after switching branches or pulling changes.

You can work around (4) to a certain extent by using an alias created by one of my coworkers:

git config --global alias.pull-recursive '!git pull && git submodule update --init'

...and then running:

git pull-recursive

If git submodules are so tricky, what are the advantages?

  1. You can check out the main project without checking out the submodules. This is useful when the submodules are huge, and you don't need them on certain platforms.
  2. If you have experienced git users, it's possible to have multiple forks of your submodule, and link them with different forks of your main project.
  3. Someday, somebody might actually fix git submodules to work more gracefully. The deepest parts of the submodule implementation are actually quite good; it's just the upper-level tools that are broken.

git submodules aren't for me. What next?

If you don't want to use git submodules, you might want to look into git merge's subtree strategy. This keeps everything in one repository.

What if the upstream repository uses Subversion?

This is pretty easy if you know how to use git svn:

git svn clone -s https://example.com/foo
cd foo
git remote add origin ssh://.../my-forked-foo.git
git push origin

Then set up a local tracking branch in git.

git push origin master:local-fork
git checkout -b local-fork origin/local-fork

Then, to merge from upstream, run:

git svn fetch
git merge trunk

(I haven't tested this code, but it's more-or-less how we maintain one submodule with an upstream SVN repository.)

Don't use git svn rebase, because it will make it very difficult to use git submodule in the parent project without losing data. Just treat the Subversion branches as read-only mirrors of upstream, and merge from them explicitly.

If you need to access the upstream Subversion repository on another machine, try:

git svn init -s https://example.com/foo
git svn fetch

You should then be able to merge changes from upstream as before.

emk
  • 60,150
  • 6
  • 45
  • 50
  • Ryan: I added some untested Subversion examples, based on a submodule I set up a few days ago. If it breaks, let me know and I'll fix it. – emk Mar 24 '09 at 22:06
  • emk: this is a very nice overview, thanks! I still not having my 'aha!' moment, but I think I will just have to try things out. Given the fact that I won't be pushing my changes back, I think the subtree merge method would be my best bet? Can I use that in combination with git-svn? – hopla Mar 25 '09 at 10:16
  • hopla: The best way to learn about git is to run 'gitk --all &', run each individual command, and then reload gitk. When teaching people git, I find that the visualization helps a lot. – emk Mar 25 '09 at 11:58
  • hopla: You should be able to use subtree merge with git svn, but I haven't tried it myself. In general, subtree mixes all the branches for all your projects in one repo. Submodules are clunkier, but they keep projects distinct. You may find them easier if you have lots of branches. Good luck! – emk Mar 25 '09 at 12:00
  • I'm also thinking about switching from SVN to Git - isn't it possible to use the same vendor branching mechanism as in SVN? (vendor branch, merging into trunk). – acme Dec 09 '10 at 08:20
3

I've looked around a bit more and stumbled upon Braid. It's a tool that automates vendor branches in Git. It can use Git or SVN repos.

By crawling through the source I found out that it uses the subtree strategy. And seems to make it really simple! Plus, it seems to fulfill all my requirements!

Before I jump in and use it: does anyone here have any experience with Braid? I would like to find out about possible cons if there are any. Also, if you haven't used Braid, but have some expertise in Git, what do you think about it, at first sight?

hopla
  • 3,322
  • 4
  • 28
  • 26
1

I use git submodules to track reusable apps in my Django projects, but it is kind of messy in the long run.

It is messy for deployment because you can't get a clean archive of the whole tree (with submodules) using git archive. There are some tricks, but nothing perfect. Besides, the submodule update mecanism is not that good for working with submodules branches.

You might have to take a look at virtualenv and pip, because they had some recent improvements in order to work with external repositories.

pip : http://pip.openplans.org/ and working with pip/virtualenv : http://www.b-list.org/weblog/2008/dec/15/pip/

Grégoire Cachet
  • 2,547
  • 3
  • 29
  • 27
  • How messy does it become? Where does the mess come from? Can you give me a URL to this 'pig' thing? Googling for 'git pig' or 'python pig' gives mixed results (about snakes eating pigs etc). – hopla Mar 24 '09 at 13:38
  • sorry, I made a typo on pig: it is pip. – Grégoire Cachet Mar 24 '09 at 14:28
  • Ok, thanks. Would I be able to make local changes using virtualenv or pip? (and keep those changes when update the 3rd party code) – hopla Mar 24 '09 at 16:17
0

I think my answer to to another questions gives exactly a nice solution for the problem described here, without going in to the hell of submodules (which I have tried, but does not even get close to the svn:externals I was used to)

Still, have a look at this answer: Do you version control the invidual apps or the whole project or both?

Before deleting my answer again, I was not aware I couldn't copy my own answer to another question, even if I Am convinced it is usefull as an answer. Sorry, but give this answer a try, it really is a nice solution. So I hope I Am allowed to refer to my own anser to another question.

Community
  • 1
  • 1
michel.iamit
  • 5,788
  • 9
  • 55
  • 74