2

I have a testcase shell script.

I have a bloated repository, call it "oldrepo", that I'm trying to refactor. The repository contains several directories of large files that I'm not interested in keeping. I want them completely removed from history. I know how to remove specific files using filter-branch, but I don't know how to remove everything EXCEPT for a specific directory or set of directories without going through and removing every file individually.

I thought I could fool git into doing this by creating a new repo and only merging the files that I want to keep:

mkdir newrepo
git init; touch README; git add .; git commit -m "initial"
git remote add oldrepo /path/to/oldrepo
git merge -s ours --no-commit
git read-tree --prefix=subdir1 -u
git commit -m "merged subdir1"
git remote rm oldrepo
git prune --verbose

Unfortunately, the prune command prunes nothing. I was hoping that it would prune every object that had never been a child of SUBDIR1-TREE. Is there a way to remove all history outside of a directory without individually removing each offending file?

Community
  • 1
  • 1
Dave
  • 115
  • 1
  • 5

1 Answers1

3

This is also a job for filter-branch, but instead of removing select files/directories as mentioned in the other question you can empty out the index completely using git read-tree --empty (for Git v1.7.4 or newer; if you're running an older version of Git use git read-tree with no arguments) and then restore the files/directories you want using git reset:

git filter-branch -f --index-filter 'git read-tree --empty && git reset -q "${GIT_COMMIT}" -- first_directory_to_keep second_directory_to_keep'
Community
  • 1
  • 1
Richard Hansen
  • 51,690
  • 20
  • 90
  • 97
  • Thanks! I've confirmed this works exactly as you say. Then, to prune the objects, I run: `git remote rm origin; rm -Rf .git/refs/original; rm -Rf .git/logs/; git gc --prune=now` I'm still trying to verify whether everything that should be gone is really gone. – Dave Jun 17 '11 at 23:21