21

What I want is similar to this question. However, I want the directory that is split into a separate repo to remain a subdirectory in that repo:

I have this:

foo/
  .git/
  bar/
  baz/
  qux/

And I want to split it into two completely independent repositories:

foo/
  .git/
  bar/
  baz/

quux/
  .git/
  qux/  # Note: still a subdirectory

How to do this in git?

I could use the method from this answer if there is some way to move all the new repo's contents into a subdirectory, throughout history.

Community
  • 1
  • 1
Thomas
  • 174,939
  • 50
  • 355
  • 478

7 Answers7

21

You could indeed use the subdirectory filter followed by an index filter to put the contents back into a subdirectory, but why bother, when you could just use the index filter by itself?

Here's an example from the man page:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD

This just removes one filename; what you want to do is remove everything but a given subdirectory. If you want to be cautious, you could explicitly list each path to remove, but if you want to just go all-in, you can just do something like this:

git filter-branch --index-filter 'git ls-tree -z --name-only --full-tree $GIT_COMMIT | grep -zv "^directory-to-keep$" | xargs -0 git rm --cached -r' -- --all

I expect there's probably a more elegant way; if anyone has something please suggest it!

A few notes on that command:

  • filter-branch internally sets GIT_COMMIT to the current commit SHA1
  • I wouldn't have expected --full-tree to be necessary, but apparently filter-branch runs the index-filter from the .git-rewrite/t directory instead of the top level of the repo.
  • grep is probably overkill, but I don't think it's a speed issue.
  • --all applies this to all refs; I figure you really do want that. (the -- separates it from the filter-branch options)
  • -z and -0 tell ls-tree, grep, and xargs to use NUL termination to handle spaces in filenames.

Edit, much later: Thomas helpfully suggested a way to remove the now-empty commits, but it's now out of date. Look at the edit history if you've got an old version of git, but with modern git, all you need to do is tack on this option:

--prune-empty

That'll remove all commits which are empty after the application of the index filter.

idbrii
  • 10,975
  • 5
  • 66
  • 107
Cascabel
  • 479,068
  • 72
  • 370
  • 318
  • Apart from the nested single quotes (that I took the liberty to replace), this worked almost perfectly. The only problem was that empty commits to now nonexistent directories remained in the log. I removed these using `git filter-branch -f --commit-filter 'if [ z$1 = z\`git rev-parse $3^{tree}\` ]; then skip_commit "$@"; else git commit-tree "$@"; fi' "$@"` that I found at http://github.com/jwiegley/git-scripts/blob/master/git-remove-empty-commits – Thomas May 10 '10 at 18:43
  • @Thomas: Thanks for fixing my careless mistake! Also, you should be able to use the commit filter in the same command as the index filter. The filters are run in the order shown in the documentation; commit-filter is naturally after the filters which modify the contents of the commit. You probably also want to use `--remap-to-ancestor`, which will cause refs pointing to skipped commits to be moved to the nearest ancestor instead of excluding them. – Cascabel May 10 '10 at 19:01
  • @Jefromi: the `index-filter` argument should be more easily expressible as `git rm -r -f --cached --ignore-unmatch $(ls !(directory-to-keep))`, see my answers http://stackoverflow.com/a/8079852/396967 and http://stackoverflow.com/a/7849648/396967 – kynan Dec 04 '11 at 14:20
  • @kynan: That doesn't behave correctly with hidden files, nor is it as easily extensible. – Cascabel Dec 04 '11 at 16:54
  • @Jefromi: Hidden files are always excluded by the extglob. It does however work for simple cases such as the one the OP asked for. – kynan Dec 05 '11 at 00:18
  • 1
    If your filenames have spaces, then you can add `| tr "\n" "\0"` between `ls-tree` and `| grep` (to make newlines into NUL), change `grep -v` to `grep -zv` and change `xargs` to `xargs -0` (to make grep and xargs expect NUL as a separator). – idbrii Mar 03 '12 at 15:08
  • Thanks so much pydave. Your comment saved me hours! – Rotsiser Mho Aug 21 '12 at 20:06
  • 1
    @pydave That doesn't help if the filenames contain newlines. The proper solution is to use `-z` with `ls-tree` rather than `| tr "\n" "\0"` so the entire pipeline from start to finish has no ambiguity. (Since `NUL` and `/` are the only two characters not allowed in a filename on POSIX-compliant filesystems.) – ssokolow Jul 04 '14 at 07:23
  • @ssokolow: That's more elegant. Updated command in answer to use nul termination. – idbrii Jul 04 '14 at 18:34
  • If you've got a short list of files, instead of removing everything else just `git read-tree --empty; git reset $GIT_COMMIT -- $your $files $here` – jthill May 04 '16 at 20:17
  • What if some files _were git-moved_ ? The index-filter command will reach those old commits and either fail to match, or worst, remove the wrong file...? How to do this safely? – PlasmaBinturong Jan 07 '20 at 15:56
3

I wanted to do a similar thing, but since the list of files that i wanted to keep was pretty long, it didn't make sense to do this using countless greps. I wrote a script that reads the list of files from a file:

#!/bin/bash

# usage:
# git filter-branch --prune-empty --index-filter \
# 'this-script file-with-list-of-files-to-be-kept' -- --all

if [ -z $1 ]; then
    echo "Too few arguments."
    echo "Please specify an absolute path to the file"
    echo "which contains the list of files that should"
    echo "remain in the repository after filtering."
    exit 1
fi

# save a list of files present in the commit
# which is currently being modified.
git ls-tree -r --name-only --full-tree $GIT_COMMIT > files.txt

# delete all files that shouldn't be removed
while read string; do
    grep -v "$string" files.txt > files.txt.temp
    mv -f files.txt.temp files.txt
done < $1

# remove unwanted files (i.e. everything that remained in the list).
# warning: 'git rm' will exit with non-zero status if it gets
# an invalid (non-existent) filename OR if it gets no arguments.
# If something exits with non-zero status, filter-branch will abort.
# That's why we have to check carefully what is passed to git rm.
if [ "$(cat files.txt)" != "" ]; then
    cat files.txt | \
    # enclose filenames in "" in case they contain spaces
    sed -e 's/^/"/g' -e 's/$/"/g' | \
    xargs git rm --cached --quiet
fi

Quite suprisingly, this turned out to be much more work than i initially expected, so i decided to post it here.

Jan Warchoł
  • 1,063
  • 1
  • 9
  • 22
  • 1
    Thanks a lot for sharing! That worked for me on a test repo. I also added `if [ "$(cat $1)" == "" ]; then echo "No content in exclude file" exit 1 fi` to check if provided file is there. Also it seems one need to provide a full path to the excluding file. – Denis Feb 20 '14 at 20:35
  • p.s. also, exclude file should have the last line empty/rubbish. – Denis Feb 20 '14 at 20:42
  • I like the idea of picking and choosing which files to keep ... but as designed, this will take in excess of 20 hours to run on a repo with 30K commits... – Linas Jun 25 '20 at 04:05
3

Use git-filter-repo This is not part of git as of version 2.25. This requires Python3 (>=3.5) and git 2.22.0

mkdir new_repoA
mkdir new_repoB
git clone originalRepo newRepoA
git clone originalRepo newRepoB

pushd
cd new_repoA
git filter-repo --path foo/bar --path foo/baz

popd
cd new_repoB 
git filter-repo --path foo/qux

For my repo that contained ~12000 commits git-filter-branch took more than 24 hours and git-filter-repo took less than a minute.

Kishore A
  • 1,293
  • 3
  • 16
  • 21
3

This is what I ended up doing to solve this issue when I had it myself:

git filter-branch --index-filter \
'git ls-tree --name-only --full-tree $GIT_COMMIT | \
 grep -v "^directory-to-keep$" | \
 sed -e "s/^/\"/g" -e "s/$/\"/g" | \
 xargs git rm --cached -r -f --ignore-unmatch \
' \
--prune-empty -- --all

The solution is based on Jefromi’s answer and on Detach (move) subdirectory into separate Git repository plus many comments here on SO.

The reason why Jefromi’s solution did not work for me was, that I had files and folders in my repo whose names contained special characters (mostly spaces). Additionally git rm complained about unmatched files (resolved with --ignore-unmatch).

You can keep the filtering agnostic to the directory not being in the repo’s root or being moved around:

grep --invert-match "^.*directory-to-keep$"

And finally, you can use this to filter out a fixed subset of files or directories:

egrep --invert-match "^(.*file-or-directory-to-keep-1$|.*file-or-directory-to-keep-2$|…)"

To clean up afterwards you can use these commands:

$ git reset --hard
$ git show-ref refs/original/* --hash | xargs -n 1 git update-ref -d
$ git reflog expire --expire=now --all
$ git gc --aggressive --prune=now
Community
  • 1
  • 1
JanX2
  • 1,303
  • 1
  • 12
  • 19
1

A cleaner method:

git filter-branch --index-filter '
                git read-tree --empty
                git reset $GIT_COMMIT path/to/dir
        ' \
        -- --all -- path/to/dir

or to stick with just core commands, sub in git read-tree --prefix=path/to/dir/ $GIT_COMMIT:path/to/dir for the reset.

Specifying path/to/dir on the rev-list args does the pruning early, with a filter this cheap it doesn't matter much but it's good to avoid the wasted effort anyway.

jthill
  • 55,082
  • 5
  • 77
  • 137
0

If you wish split out just single directory as separate git repository

git-filter-branch has --subdirectory-filter option and it is much simpler then previous mentioned solutions, just:

git filter-branch --subdirectory-filter foodir -- --all

Additionally it change path and place content of directory on top of new repo, not just filter and remove other content.

Hubbitus
  • 5,161
  • 3
  • 41
  • 47
0

I used git-filter-repo with filename-callback.

stephen@B450-AORUS-M:~/source/linux$ git filter-repo --force --filename-callback '
  if b"it87.c" in filename:
    return filename
  else:
    # Keep the filename and do not rename it
    return None
  '
warning: Tag points to object of unexpected type tree, skipping.
warning: Tag points to object of unexpected type tree, skipping.
Parsed 935794 commitswarning: Omitting tag 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c,
since tags of trees (or tags of tags of trees, etc.) are not supported.
warning: Omitting tag 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c,
since tags of trees (or tags of tags of trees, etc.) are not supported.
Parsed 937142 commits
New history written in 177.03 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at a57e6edb85a3 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157
Enumerating objects: 20210, done.
Counting objects: 100% (20210/20210), done.
Delta compression using up to 12 threads
Compressing objects: 100% (17718/17718), done.
Writing objects: 100% (20210/20210), done.
Total 20210 (delta 1841), reused 20038 (delta 1669), pack-reused 0
Completely finished after 179.76 seconds.

It didn't remove empty merge commits, probably due to a bunch of tags that were associated with one side of the tree.

I tried using the top most voted answer, and it didn't seem to remove anything, and took a long time.

Rewrite 3e80e1395bd4f410b79dc0f17113f5b6b409c7d8 (329/937142) (8 seconds passed, remaining 22779 predicted)

22779 seconds = 6.3275 hours

Stephen
  • 1,603
  • 16
  • 19