52

I'm trying to split a subproject off of my git repository. However unlike in Detach (move) subdirectory into separate Git repository I don't have it in it's own subdirectory (and moving it in and doing the above only yields the history after the move).

I've cloned the branch from which I want to split off the subproject into it's own repository and removed everything that isn't used by the subproject, so basically I could use this as the repository of my subproject.

Now I want to get rid of the history of all files that are no longer in this repository so as to only keep the file history for the files that made it into the offspring.

I think it must be possible with git-filter-branch but I can't figure out how

Many thanks in advance

Community
  • 1
  • 1
Niklas Schnelle
  • 1,139
  • 1
  • 9
  • 11
  • 1
    See also [New repo with copied history of only currently tracked files](http://stackoverflow.com/questions/17901588/new-repo-with-copied-history-of-only-currently-tracked-files). –  Jul 27 '13 at 22:11
  • have you found a solution for this? I am having exactly the same problem now. – Felix Cen Apr 05 '18 at 18:13

3 Answers3

14

Here are some instructions to do what you want.

This will remove file_to_remove:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_to_remove' --prune-empty -- --all
Neil Forrester
  • 5,101
  • 29
  • 32
  • 33
    The thing is I want to just keep the files and their history that are in the working directory and have git forget about all others. It would be quite cumbersome to first find all deleted files and remove them with the above command, that's why even though I found it it's of not too much use – Niklas Schnelle May 21 '12 at 08:45
  • Note that you can use `git rm -r` for entire directories, deleting recursively. – Oyvind Dec 16 '19 at 13:09
  • @Oyvind Using `git rm -r` only deletes a file/directory from the working directory, and doesn't delete any of the history of the file/directory. It only adds the deletion to the top of the history. – David Maness Sep 16 '20 at 15:00
8

Ok now I'm trying with the following technique, will report back if it worked, because it seems to be quite long running: On a zsh or bash ON A CLONED Repository

git log --diff-filter=D --summary <start_commit>..HEAD | egrep -o '*[[:alnum:]]*(/[[:alnum:].]*)+$' > deleted.txt

to get all deleted files

for del in `cat deleted.txt`
do
    git filter-branch --index-filter "git rm --cached --ignore-unmatch $del" --prune-empty -- --all
    # The following seems to be necessary every time
    # because otherwise git won't overwrite refs/original
    git reset --hard
    git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
    git reflog expire --expire=now --all
    git gc --aggressive --prune=now
done;

This might be extremly dangeours for your data so only try on clones.

Dawson Toth
  • 5,580
  • 2
  • 22
  • 37
Niklas Schnelle
  • 1,139
  • 1
  • 9
  • 11
  • 1
    What did you end up finding? – Thorbjørn Ravn Andersen May 22 '13 at 09:15
  • 4
    The reason it appears to run so slow for you is because you're running the `git filter-branch` command ***once for each file***, along with a bunch of other commands (`git gc` is not a cheap nor fast command to run) instead of running it ***once for all files***, so it's probably extremely inefficient. See the comments at [New repo with copied history of only currently tracked files](http://stackoverflow.com/questions/17901588/new-repo-with-copied-history-of-only-currently-tracked-files). –  Jul 27 '13 at 22:12
  • Will pushing to github or gitlab clean-up the remote repository? – oxygen Oct 31 '17 at 10:20
1

Your friend is git filter-repo. It's available e.g. in recent LTS Ubuntu repos

sudo apt install git-filter-repo

Try this

ls > /tmp/files.list
git filter-repo --paths-from-file /tmp/files.list
j24
  • 11
  • 2