20

I've been attempting to detatch a project sub-library into a new repo, but I want to detatch the full folder path, not just the target folder down.

When I filter, the resulting folder structure is parented from the target folder rather than the repository root (or an ancestor I specify) which effectively breaks package structure as classes no longer correctly match folders:

Original repo:

+- repo/master
    +- lib1
    +- lib2
    +- lib3
        +- ClassA
        +- ClassB
        +- ClassC

Git command:

git subtree split -P lib3 -b new-branch

Resulting structure:

+- repo/new-branch
    +- ClassA
    +- ClassB
    +- ClassC

Required structure:

+- repo/new-branch
    +- lib3
        +- ClassA
        +- ClassB
        +- ClassC

Is there any way to do this? I've bean reading about rebasing to somehow add the root "core" folder as if it was there before the split, but my git-fu is not strong enough, sadly :(

The other thing I tried is filter-branch to remove other directories, but I could only seem to remove one root-level folder, then git started complaining when I tried others.

Thanks :)

Dave Stewart
  • 2,324
  • 2
  • 22
  • 24
  • Is there any reason why you can't just create a new branch from `master`? This is what your diagram appears to be saying. – Tim Biegeleisen Oct 05 '15 at 03:04
  • I edited the question to make things clearer, but essentially I want to take only the commits for that folder to create the new repo. Am I barking up the wrong tree with subtree split? – Dave Stewart Oct 05 '15 at 03:07
  • I would look into using _submodules_ if you want a section within a repo to behave like a separate repo, but then again this is just me. – Tim Biegeleisen Oct 05 '15 at 03:08
  • That is the plan, but the first step is extracting only the commits for this library (vs the whole project) then in future projects, using the extracted folder & files as a submodule. – Dave Stewart Oct 05 '15 at 03:12

2 Answers2

18

git subtree split doesn't appear to offer an option for what you want (but it sounds useful so maybe you could contribute one to the project!)

So, there's two ways to do this, depending what you want.

1) Export a single directory (simpler option)

This takes advantage of the fact you want to move to another repo, so we can extract the subtree, and then relocate it in separate steps.

  1. Use git subtree split to extract the files you want to the an intermediate branch in your repository (you have already done this).

     git subtree split -P lib3 -b new-branch
    
  2. Create a new, empty repository:

     git init lib3-repo
     cd lib3-repo
     git commit --allow-empty -m 'Initial commit'
    
  3. Add the contents of the intermediate branch as a subtree:

     git subtree add -P lib3 repo new-branch
    

This should rewrite the history again and reinsert the missing directory level.

Every time you want to exchange history between the two repos you'll have to go through the intermediate branch (i.e. subtree split, then subtree pull), but it ought to work.

2) Export any set of files (more complex)

To keep multiple, specific subtrees, you'll need git filter-branch.

There are lots of ways to pick and choose which commits and files to keep or discard, but this recipe uses --index-filter to select files without having any access to the contents of the files.

To keep all files in the "lib3" and "src/core" directories, without editing their locations in any way.

git co -b new-branch
git filter-branch --index-filter \
    'git ls-files \
       | grep -v "^lib3/\|^src/core/" \
       | xargs --no-run-if-empty git rm --cached' \
    HEAD

The filter code is a shell-script that edits the Git index (we're using --index-filter, remember).

git ls-files is the same as ls except that it lists files in the repo, not in the working tree.

grep -v <pattern> gives everything that does not match the pattern, and \| in the pattern is an alternative, so we get the list of files to delete.

xargs --no-run-if-empty runs a command for each filename in the input from the pipe (unless there aren't any).

git rm --cached deletes files from the index.

This creates a branch (new-branch) that has the filtered files you want. You can import them into another repo using a normal pull command:

git init new-repo
cd new-repo

git remote add origin /path/to/old-repo 
git pull origin new-branch
ams
  • 24,923
  • 4
  • 54
  • 75
  • Yeah, some kind of flag like `--root` would be really useful! OK, trying that now... – Dave Stewart Oct 05 '15 at 11:53
  • OK, thanks - that kinda worked... I have all the files in now, and there is a root folder "core" but the entire commit history is missing the "core" folder. What I want is for core to always have been there. Is this possible? – Dave Stewart Oct 05 '15 at 12:14
  • No, I don't believe it's possible to do that with `subtree`. You'd have to experiment with `git filter-branch` and/or `git fast-export` to do anything that exports multiple subtrees. – ams Oct 05 '15 at 12:25
  • Thanks @ams - I had a crack and it deleted a lot of folders, but not the right ones (think my understanding of the piping and my subsequent regex was off) but I'll have another crack tonight. Really appreciate your dogged determination! – Dave Stewart Oct 05 '15 at 14:54
  • I just realized the `-s` was a mistake; it's gone now. Otherwise, `grep -v ` means delete all lines with that pattern (`-v` reverses the usual function of grep). The pattern is supposed to delete everything that is not in "lib3/" or "core/". – ams Oct 05 '15 at 15:11
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/91408/discussion-between-dave-stewart-and-ams). – Dave Stewart Oct 05 '15 at 16:56
  • 1
    I like the solution that uses `filter-branch`. It's a bit more complicated but it keeps the commit logs in pristine condition. – Jake Jan 22 '17 at 06:46
  • 3
    It would worth mentioning for method 1) that the new repository can't be empty, git will complain. So add at least one commit before adding the subtree. – HappyCactus Mar 16 '17 at 18:40
  • 4
    Thanks @HappyCactus for the tip about method 1): if you start with an empty repo, `git subtree add` will give you an error: `fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.` In that case do: `git commit --allow-empty -m 'Initial commit'` – yktoo Mar 05 '19 at 09:13
  • Should that semicolon at in the `filter-branch` bit actually be there? Seems like it would make `HEAD` be a separate command. – Zhuge Mar 22 '19 at 21:21
  • Note that `subtree split -P` will only accept a directory specifier, not an individual file. Specifying a file will emit `assertion failed: test blob = tree -o blob = commit` errors. – Quolonel Questions Aug 25 '20 at 12:36
  • `git co -b` means `git checkout -b` I guess? Not everyone uses alias. – zypA13510 Dec 04 '20 at 07:49
  • very useful. It helped a lot. In Option1 I had to create one commit and create the master branch, only after that i was able to use it. May be you would like to edit that. – Amit Andharia Mar 09 '22 at 13:36
0

If you can accept a dependency on python3, this script is very flexible and powerful, making it the best tool for this and many other repo tasks: https://github.com/newren/git-filter-repo

Example, given your original repo and the new repo-split you want to create:

git clone repo repo-split
cd repo-split
git-filter-repo --analyze # Not required; just cool to see.
git-filter-repo --path lib3 # Can have multiple.
# And if you did want to change it, you can add --path-rename lib3/:stuff/ 
CrazyPyro
  • 3,257
  • 3
  • 30
  • 39