2

I have a largeish project which has within it a sub-directory which itself is a project.

I want to manage this sub-directory within the larger project sharing the same branches, tags, etc, for simplicity sake. The integrity of the larger project and ability to track it as a monolithic git repo is very important to me.

But I also want to publish and accept pull requests from contributors in the smaller project (sub-directory), without exposing them to my larger project. The sub-project requires some ancillary files (like an Eclipse .project) in its parent in order to be fully self-contained.

I've looked into submodule, subtree merge, and sparse-checkout, but couldn't figure out how to do this. As a stop-gap measure I just ship out a tarball of the sub-directory and ask for patches in return.

Any ideas how to improve this with a git-centric workflow?

Alex R
  • 11,364
  • 15
  • 100
  • 180
  • Could you explain in more detail what features do you expect from that workflow? E.g., do you expect the common files (like this Eclipse .project) to change often? Do you need the possibility to work on this particular subdirectory in context of whole project, or developing it as an independent project and just updating version residuing inside main directory would be enough? I have a general idea for some solution, but i'm not sure about a couple of nuances. – Frax Mar 21 '15 at 19:03
  • I expect to work in the sub-project myself but also easily incorporate changes made by others. – Alex R Mar 21 '15 at 19:40
  • Another way to explain this... I'm looking for some combination of Sparse Checkout with multiple origin repositories. My main project would have the big repo as origin. But one subfolder would also have 2nd remote which is a sparse checkout of another repo. – Alex R Mar 21 '15 at 20:02
  • 1
    Migrate to maven. Native Eclipse files will in the long run cause you more pain than you think. – Thorbjørn Ravn Andersen Mar 21 '15 at 20:18
  • `sparse-checkout` is not going to work for you. This "sparsity" affects only working directory, not the commits. Subtree merge won't be better than submodule. – Frax Mar 21 '15 at 22:28
  • Hmm, so you need to have files in your second repository, that aren't in the main one? That complicates the things greatly. I don't think you can achieve that without making rebases/cherry-picks in some place of your flow, or maybe in several places. That will make your history at least a bit chaotical and won't let you fully benefit from using git. If I were you, I would try hard to get rid either of these files (to use setup from my answer) or common files in main directory (to be able to use submodule). – Frax Mar 21 '15 at 23:05
  • 2
    The best answer is: Don't do this. There is a reason your near-duplicates are not well answered: **This is not what Git is for**. Use whatever code-reuse mechanisms are available to you in your language. Git is not for code-reuse, it's a version control system. You need to build a library, and include that library in your parent project, *not* the entire source tree. That's *bound* to be a leaky system where the child project and parent project wind up needlessly bound together in a bunch of gross ways. At that point, it might as well be one big repository. – user229044 Mar 21 '15 at 23:40
  • Hi @meagar. Agree with you 100% when using a real programming language. Unfortunately this one is in PHP. Not my first choice. Or 2nd or 3rd or 4th. Just something I'm stuck maintaining. – Alex R Mar 22 '15 at 02:28

3 Answers3

1

Ok, that is not a perfect setup, but should be good enough.

The idea is to have one branch in which everything apart from that subdirectory and common files is removed (I mean: don't exist at all), and to accept pull requests for that branch (you may create a new repository containign only that branch).

It is a bit twisted, so I'm going to show it through an example. Let's say, your project was created with commads like this:

mkdir bigproj
cd bigproj
git init
echo "common file" >common
mkdir subproj
echo "subproj content" >subproj/content
mkdir other
echo "other content" >other/content
git add common subproj other
git commit -m 'Initial commit'
git commit --allow-empty -m 'Some history'

It has a common file inside, a subproj subdirectory with some content, and the other subdirectory with another content. Tree:

.
├── common
├── other
│   └── content
└── subproj
    └── content

Now let's create a branch containing only common and subproj without history:

git checkout --orphan subproj-branch
git rm -rf .  # clear the index
git checkout master -- common subproj  # put `common` and `subproj` back to index
git commit -m 'Initial commit for subproj-branch'

Resulting tree:

.
├── common
└── subproj
    └── content

Merge this branch back into master to avoid possible false conflicts:

git checkout master
git merge subproj-branch  # obviously no conflicts

Now we can publish subproj-branch in some dedicated repository:

git remote add subproj-repo <some url>
git push subproj-repo subproj-branch:master --set-upstream  # -f may be needed.
# And remote branch doesn't have to be named master, of course.

Repository is published, we got some patches. Now we can merge them:

git checkout subproj-branch
git pull
git checkout master
git merge subproj-branch

That is the base flow, which allows making changes in subproj-repo and incorporating them into main repository. Now, making it the other way around is a bit more problematic, but possible. There are to possibilities:

  1. Changes touch only subproj/common. We may take them "as they are":

    git checkout subproj-branch
    git cherry-pick master  # replace master with anything you actually need
    git checkout master
    git merge subproj-branch
    git push subproj-repo subproj-branch:master
    
  2. Changes touch both subproj/common and other files. You can manually checkout each changed file into subproj-branch then commit and merge back to master (to avoid false conflicts in future). That is not perfect, and you may want to alter that step somehow.

    git checkout subproj-branch
    git checkout master -- common subproj
    git commit -m 'Some changes'
    git checkout master
    git merge subproj-branch
    git push subproj-repo subproj-branch:master
    

Important part here is merging changes back to master. That may seem nonsensical, but may prevent some conflicts from happening.

Whoa, that is a long answer. I hope it will be helpful :P

Frax
  • 5,015
  • 2
  • 17
  • 19
  • Seems like repository you set as remote already has `master` branch. You may either use another branch name or use `-f` (shortcut for `--force`) option, like `git push -f subproj-repo subproj-branch:master`. This will replace `subproj-repo/master` with `subproj-branch`, no matter what is there now. Be careful, however: this may permanently remove some commits from repository. Also, double check there are no spaces around `:`. `git push -f` is one of rare git commands, that may be really destructive. – Frax Mar 21 '15 at 21:12
  • I just supposed that remote repository will be freshly created one without any branch inside - in that case original code works. – Frax Mar 21 '15 at 21:13
  • yep you're right. So I pushed it to a new branch on the 2nd remote. But I'm reading up on this and I'm a bit puzzled why you recommended --orphan on the initial checkout? This would seem to prevent diffs and merges across the two repos – Alex R Mar 21 '15 at 21:15
  • the "checkout -- subfolder" is interesting, it seems similar to sparse checkout but better in some ways – Alex R Mar 21 '15 at 21:16
  • `--orphan` is crucial for this setup, because you want `subproj-branch` to have no history. And there is no problem with merging commits with completely unrelated history. It works just like if there where an empty commit being common base - all content in both branches is treated as added there. Diff is also not a problem, as it just compares directory state - it is completely unaware of history (unless used with `...`). And, of course, there _is_ common history after first merge :) – Frax Mar 21 '15 at 21:35
  • I'm not sure if i understand sparse checkout well, but it seems to be quite different from `git checkout -- paths`, even if it has similar effects. Idea of sparse checkout is to prevent some parts of working directory from being affected by checkout, while keeping other effects of checkout - i.e. moving HEAD and changing current branch. OTOH, `git checkout ` is solely for checking out individual files. – Frax Mar 21 '15 at 21:48
0

This is not exactly a full answer, but working from @Frax's answer I ran into a few things I didn't understand, or didn't work as I expected, and I needed to modify it as follows:

  • First I used filter-branch to create a new branch that has only the contents of the common subdirectory with history. (* I avoid making this a separate repo because I have a limit on number of private repos at GitHub *). This will simplify all remaining operations as most git commands work naturally at a branch scope, but I keep getting confused when trying to do things at a subtree scope.
  • Even this filtered history is extremely bloated so I proceeded to create a branch with no history using Frax's suggested --orphan option. Excellent suggestion.
  • It's important to do a git merge back onto the full branch (not the orphaned branch), this merge direction seems to be backwards in Frax's answer, or I just don't understand what I'm doing. The result is that the commit hash from the new branch can be seen in the history branch.
  • Finally, use submodule add –b branch to incorporate this new branch into both the original project and subproject.
  • After just a few minutes of trying to use this, I already ran into limitations with Eclipse's handling of submodules. Looks like next alternative is to just split up the repos.

A few more notes

  • This question appears to be very similar: How can I keep a subfolder of a git repo in sync with a subfolder of another git repo - I just offered 100 point bounty on that question
  • Using sparse checkout to grab a subdirectory from one repo and commit to the other "almost works", except, my entire 500MB repository gets replicated behind the scenes even though my sparse checkout covers only a handful of text files.
  • Using --depth=1 was hopeful, except trying to commit to the 2nd repo I got ! [remote rejected] master -> master (shallow update not allowed)
  • Another similar question (unanswered): Child git repository as subset of a main repository
  • The answer from @Frax below works fine if I push to a fresh new branch on the 2nd remote. But I wanted it to be integrated with the remote master so that additional supporting files that are necessary to run the sub-project in stand-alone mode could be in the same branch.
Community
  • 1
  • 1
Alex R
  • 11,364
  • 15
  • 100
  • 180
0

I found the answer hidden at How do I merge a sub directory in git?

The key bit of git magic is to use the following to sync up the two common subdirectories:

git read-tree --prefix=MyHugeProprietaryWebApp/public_html/ -u contrib/master:MyOpenSubproject/public_html/

Where

  • MyHugeProprietaryWebApp is the project's top-level directory below the repository root (i.e. this is the project root folder in Eclipse)
  • public_html is the subdirectory containing the code I want to work with contributors on (in my specific case this is some PHP layout code). It apperas twice in the command because the subdirectory is named the same in both repositories.
  • contrib is the repository on github (shared with contributors) which I created earlier using git checkout --orphan and git push as suggested by Frax; I have not tested possible alternatives such as --depth=1.
  • MyOpenSubproject is the root folder of the smaller project and is what contains the Eclipse .project and other ancillary files that make the sub-project self-contained. These ancillary files are NOT shared with the larger project and includes extra documentation, tests, etc, that are only relevant to the outside contributors and not the insiders working on the larger project.

My experience with this approach has been limited to a few dry-run tests, but I'm happy with what I see so far. I have not yet experimented with pull -s subtree -X path which I may need at some point.

Community
  • 1
  • 1
Alex R
  • 11,364
  • 15
  • 100
  • 180