12

I have a closed source project that is built on my open source framework. I want to know how I should structure my workflow. Below is my best guess using git with submodules.

  1. I create a public framework repo on github with submodules that are separate git repos.
  2. I purchase a "micro" account on github ($7) so I can have a private repo.
  3. I create a private repo and clone the public framework repo.

From here I can make changes to:

  1. My private code and push to my private repo on github
  2. The public framework code and push to my private github repo and then send a pull request from the public framework..? Or how would this work?

How do I handle a repo that contains private and public code and submodules. Right now it seems like I just have to maintain two separate codebases to achieve this.

I'm looking for the best answer that can help someone fairly new to git streamline the process of working on a codebase that is half open source and half private. One good thing about it is that each folder is either private or public so there is no worry about having private and public files together somewhere - yet some of the private folders might be in public ones!

Another example I could give would be using zendframework to build your private company site while still being able to do pulls each day (and maybe patch pushes) to the zend repo. And also pulls and pushes of your private site inside the zendframework.

For example, imagine a directory structure like this:

/private_folder
/public
        /public_folder
        /public_folder2
        /private_folder

Perhaps I'm asking two much to handle them all in one joined repo directory. Maybe there is no easy way to do this and I should separate them and do all the public patches in one and then just pull into my private repo. Of course, this means that if I am in the middle of working on some private code - I'll have to leave that repo and go open up the public one and make the patched code change, then go back to the private one, merge, and then continue working on the private code.

Xeoncross
  • 55,620
  • 80
  • 262
  • 364
  • You mention Zend above. Does this mean your project is PHP-based? Or is this just an example? (I ask because different languages package up code in different ways and it may affect your workflow.) – David J. Feb 09 '10 at 06:06
  • Yes, it is PHP based - so no *building* required. – Xeoncross Feb 09 '10 at 17:04

6 Answers6

6

I recommend not to use git submodules, but 2 different repositories that are not connected on github.

You could build the relationship between them using symlinks on the checked out copies, which is basic and simple. The symlinks only have to be created once per location (production, development, coworkers).

The advantage is that nobody has to do the extra effort to learn and maintain git submodules, and you avoid the risk and complexity it brings.

It could be done by keeping a working copy of the os and of the private git repo somewhere on your local machine:

/repos/myproject-os
/repos/myproject-priv

Then you could create create your directory structure where the project actually will live and be worked on somewhere else on this machine (not inside the /repos/ tree) and create symblinks for the subdirectories you use:

ln -s /repos/myproject-os/dir1 /wrk/myproject/base/dir1
ln -s /repos/myproject-os/dir2 /wrk/myproject/base/dir2
ln -s /repos/myproject-priv/dir1 /wrk/myproject/base/dir3
ln -s /repos/myproject-priv/dir2 /wrk/myproject/base/someother/dir4
mkdir /wrk/myproject/base/config
mkdir /wrk/myproject/base/tmp

That way you have the repository structure always clean and can mix and arrange the directories from both repositories the way you want them, and you have also a space for local configs or temp files that do not go into the repositories.

You would do the git commits and everything from the /repos/ tree and your project would run and you would edit the files from the /wrk/ tree. Please note that the .git diretory where the git data lives would not be available form the /wrk/ tree, because you only link to subdirectories (or possibly single files from the root directory).

Part2: You say you want to make sure that you do not accidently push private code into the public repository. You could set up an additional git repository between your working OS repository and the github repository, let's say you put it into /repos/gatekeeper, then your tree looks like this:

/repos/gatekeeper/myproject-os
/repos/myproject-os
/repos/myproject-priv

Every time you push from /repos/myproject-os it goes to /repos/gatekeeper/myproject-os. But from /repos/myproject-priv you push directly to your private github repo.

That way you have the same workflow in both /repos/myproject-os and /repos/myproject-priv and you don't need to worry so much. From time to time when you want to push your changes to the real OS codebase, you go to /repos/gatekeeper/myproject-os and push from there to github.

You could do additional code review before that and look at the diffs so you are sure that only that what you really want goes public.

If you want additional security the /repos/gatekeeper/myproject-os could also be on a different machine or even different location.

Sven Larson
  • 386
  • 2
  • 8
  • Nice idea. I'm guessing that for a lot of folders (I have hundreds) you could create a shell script that would create all the folders for you on each machine taking a simple `repo_path` variable into account. – Xeoncross Feb 12 '10 at 19:25
  • I'm going to award you the bounty because what you are saying would work - but I don't think I'll actually do this. Much to hard of a setup for several people in a team and with multiple projects like this. – Xeoncross Feb 13 '10 at 00:48
  • Interesting answer. I'm on Linux myself, but most developers in my company are on Windows, so they don't have the luxury of ln -s. Has anyone tried this on Windows? – Amedee Van Gasse Jul 01 '15 at 14:40
  • @AmedeeVanGasse, on Windows this will not work. Also need cross-platform solution: Linux, Mac, Windows. – Aleksey Kontsevich May 11 '18 at 21:55
  • Symlinks also work on Mac. Windows is the only one who is difficult and doesn't want to work cross platform. And even there you have an undocumented alternative for symlinks. I think they are called junctions, not sure because I don't use Windows. Anyway it can be made cross platform with some effort and good will. – Amedee Van Gasse May 11 '18 at 22:34
  • @AmedeeVanGasse, I think next answer with 'public' and 'private' branch in your local repository is the best option. This article explain it in details: https://24ways.org/2013/keeping-parts-of-your-codebase-private-on-github/ – Aleksey Kontsevich May 12 '18 at 00:17
3

You can have a 'public' and 'private' branch in your local repository. When you push, each branch gets pushed to a separate remote repository (look up the 'git push' syntax). Then, you can freely merge from public to private.

I'm sure there's a way you could merge selected changes from private to public, too, though I'd have to look it up.

Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
  • With this workflow you can't have a working directory with files from public AND private. When you have checked out the public branch, you only see the public files; checkout the private branch and you only have the private files. – Amedee Van Gasse Jul 01 '15 at 14:33
  • @AmedeeVanGasse: That is incorrect. The private branch has both public and private files. The public branch only has public files. That's what "merge" does. – Dietrich Epp Jul 01 '15 at 19:13
  • That was not obvious from the explanation! – Amedee Van Gasse Jul 01 '15 at 19:14
  • I was hoping that people were already familiar with how "merge" works, when I say "merge from public to private". – Dietrich Epp Jul 01 '15 at 19:18
  • The issue I have with this approach is that you continuously have to switch between public and private, and merge all the time. And as you indicate yourself, getting changes to a public directory in the private branch over to the public branch would be non-trivial. – Amedee Van Gasse Jul 02 '15 at 09:08
  • @AmedeeVanGasse: I think you're imagining a more specific situation than given in the question. This is just one approach, to be sure, and there are cases in which it would be inconvenient. For example, if you imagine that certain directories are public or private, or if you imagine that you want to make private data public. Git does not really have a notion of "private" or "public" to begin with. The alternative—to use multiple repositories—has its own disadvantages. Using submodules requires that public data be below the private data directory root, for example. – Dietrich Epp Jul 03 '15 at 11:11
  • @AmedeeVanGasse: This is one of those questions that doesn't really have a "correct" answer, but will instead collect a number of different approaches to solve the problem, each of which might be the right approach for a specific scenario. – Dietrich Epp Jul 03 '15 at 11:12
  • Interesting approach. As I have very small part of private code and rarely changed - could be very good approach to me, easily merging from public to private. – Aleksey Kontsevich May 11 '18 at 22:03
1

git submodules allows you to define a configuration (see this question), that is a reference to one commit of another component (in another repo).

You can develop both codes (your and the submodules) within the same repo, but when you are talking about multiple private directories within your public code, that calls for a subtree merge strategy.
It will allow you to consider your directories (the private and public ones) as one natural working tree.

And to better manage the push and pull of parts of your global repo to a private one, I would recommend the git subtree script tool.

Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
1

To summarize, I recommend this workflow:

  1. keep it simple; have one working copy for each repository (don't use git submodules)
  2. use your language's tools to package up your framework
  3. setup scripts or light tooling to make context switching fast or automatic

I've used git submodules in the past. I don't think they are a good fit for your use case. The big downsides that jump out at me are:

  • It helps to eat your own dog food when you build (or extract) a framework. Do you expect your framework users to also setup git submodules when they use your framework? I'm guessing not.
  • There is some risk of accidentally publishing your private source code into your open source framework.
  • Git submodules have improved quite a bit in the last year or so, but they still are relatively less well understood. Even competent gitters may struggle with submodules.

Here is one sub-question that I will admit is not so clear cut: "Which workflow makes it easier to bounce back and forth between the OSS framework and the private project?"

  • There is a certain allure to using submodules and having both projects in one tree. This will speed you up perhaps in your text editing, but probably will slow you down (or cause more mistakes than usual) when it comes to committing and pushing.

  • There is a certain allure to having the projects separated. The context switch (from one text editor window to another) may help remind you that the OSS project is for public consumption. For example, it may help discipline you to not to break backwards compatibility and to keep a good changelog. Committing and pushing will be easy relative to the submodule alternative.

One you have decided on your working copies, you'll want to figure out your day to day workflow. It will depend on your language of course. (In Ruby, for example, you might package up your OSS framework as a gem, build it, then have your private code depend on it.) Whatever you pick, setup some scripts (or editor shortcuts perhaps) to help you build your libraries (or packages) quickly, perhaps even automatically when files change, so that you can bounce between your framework and project effortlessly.

David J.
  • 31,569
  • 22
  • 122
  • 174
  • I've been reading a lot in the progit.com book lately and I think that you are right - it would be best to keep everything separate if for simple reason that I might accidently push private code into the public repo. The other reason is that I will have to adopt this method anyway for additional projects that depend on the framework since I'm not going to keep adding more and more sites to one combined repo. – Xeoncross Feb 09 '10 at 17:03
0

There's two approach here:

  1. You could use branch's of the same git repo. In your private repo create a branch with a reference to your public repo and handle both like that.

  2. If the components using in your private project are sub-project of your public stuff, then you should use submodules. The handling of submodule is in a kind-of early stage on git at version 1.6.6, but could be useful as your using subproject.

What is seems to me you can't loose if which project tribute to each project, so if you have that clear, then no matter what you choose it'll work !!!!!!. Besides git is easy.

erick2red
  • 1,312
  • 1
  • 14
  • 19
-1

Make the public repo a submodule inside the private one. When pushing, remember you have to push them both. Also remember to check in the submodule itself in the private repo, so it tracks what revisions of the submodule it is using.

Andrew McGregor
  • 31,730
  • 2
  • 29
  • 28