37

Possible Duplicate:
How to split a git repository while preserving subdirectories?

At one point I added my code to an existing git repo, and have committed to it quite a lot since, while the other developer has committed to the other existing files in the repo. Now I want to split my code off into its own repo, but preserve all the change history for my particular files.

Reading through what others have done for splitting code out I was looking at filter-branch and doing --index-filter or --tree-filter with rm commands for the files I don't care about. I don't want to use --subdirectory-filter as it is not appropriate for the subdir holding my code to be the topdir (also we shared one subdir). To complicate matters, some of the files from the original repository have moved around a bit over time, and there are some files that were created then deleted. This makes designing an rm list a bit... challenging.

I'm looking for a way to filter everything /except/ a list of files/directories. Does anybody know of a way to do this?

Community
  • 1
  • 1
jkeating
  • 1,154
  • 1
  • 9
  • 12
  • 2
    Thanks. Got it done with `git filter-branch --prune-empty --index-filter 'git ls-tree -r --name-only --full-tree $GIT_COMMIT | grep -v "^src/pyfedpkg$" |grep -v "^src/fedpkg" |grep -v "^git-changelog" | xargs git rm --cached -r' -- --all` – jkeating May 14 '11 at 03:29
  • 2
    Not the same as http://stackoverflow.com/questions/2797191/how-to-split-a-git-repository-while-preserving-subdirectories (IMHO) since it is asking for a particular set of sparse files to be preserved. – rogerdpack Sep 19 '16 at 18:37
  • 3
    This is not a duplicate! The question/answer provided there (2797191) only applies to files under a single directory. This question is asking for a set of files, unnecessarily grouped under one directory. – jxy Feb 07 '17 at 03:58

1 Answers1

50

Just to close the loop on this so it appears as answered.

By using index-filter or tree-filter and then applying reverse logic like git ls-tree piped into (multiple) grep -v's piped into xargs for git rm you can indeed remove everything that doesn't match a narrow set of file names/directories. Here is the command I used to split out my particular files:

git filter-branch \
    --prune-empty \
    --index-filter '
        git ls-tree -z -r --name-only --full-tree $GIT_COMMIT \
        | grep -z -v "^src/pyfedpkg$" \
        | grep -z -v "^src/fedpkg" \
        | grep -z -v "^git-changelog" \
        | xargs -0 -r git rm --cached -r
    ' \
    -- \
    --all
AdrieanKhisbe
  • 3,899
  • 8
  • 37
  • 45
jkeating
  • 1,154
  • 1
  • 9
  • 12
  • 4
    When a file is brought into the tree in a commit all by itself, "grep | xargs git rm" part will result in non-zero exit code and [--index-filter will fail](https://github.com/git/git/commit/8c1ce0f46b85d40f215084eed7313896300082df). I had to augment xargs with -r or --no-run-if-empty option (GNU extension). I suggest augmenting the answer as such. – lkraav Jul 20 '13 at 07:35
  • 1
    This fails for me with" "pathspec 'SomeDir/SomeSubDir' did not match any files." SomeSubDir is only the first part of a subdirectory name that contains a space. So it would seem this solution does not work on repos with subdirectories that contain spaces. Any possible workaround? – Mark Edington May 24 '14 at 17:08
  • I created a bash script with `sed 's/ /\\ /g'` and inserted that after the last grep. I used a script to avoid issues with the single quotes. That helped, but I'm still left with empty commits. – Mark Edington May 24 '14 at 19:08
  • 2
    You'll want to use `-0` with `xargs` and `-z` with `git ls-tree` and `grep` if you want it to work reliably with all possible filenames. – ssokolow Jul 02 '14 at 12:23
  • For whatever reason, I got rid of empty commits by repeatedly saying `git filter-branch -f --prune-empty -- --all`. – Torsten Bronger Aug 29 '14 at 14:22
  • 1
    Adding `-q` to the `git rm` command also helps not getting the output flooded with `rm '...'` log lines. – Simon Sapin Nov 27 '14 at 11:44
  • Wow, there's a "one liner" for everything in git :) – enobayram Dec 05 '14 at 06:26
  • 2
    @jkeating brilliant - I was so impressed I just created a [`git splits`](https://github.com/simpliwp/git-splits) extension that incorporates this into git. – AndrewD Feb 12 '15 at 01:54
  • Thanks I [successfully used this command](https://github.com/cognoma/machine-learning/issues/64#issuecomment-255856647 "GitHub Issue: Should cognoml reside in a separate repo?") to split out a directory and two file names. – Daniel Himmelstein Oct 25 '16 at 15:30
  • While it works on some repositories, I got `fatal: ambiguous argument 'refs/remotes/origin/master^0': unknown revision or path not in the working tree.` on another – Hubbitus Jun 26 '18 at 09:23
  • I had to add `--ignore-unmatch` to `git rm` to avoid a crash on a file that was added midway through the history. (`git rm` will throw an error by default if told to remove a file that does not exist.) – mb7744 Jul 31 '19 at 21:04