9

I'm migrating a repository from svn to git.

In this last step, I want to remove tons of files that aren't needed from the history.

I'm trying the following command:

git filter-branch --prune-empty --index-filter \
  "for file in $(cat files); do git rm -rf --cached --ignore-unmatch ${file}; done" -f

But it says that the argument list is too long.

I could rewrite this like:

for file in $(cat files); do
  git filter-branch --prune-empty --index-filter \
    "git rm -rf --cached --ignore-unmatch ${file}" -f
done

But it will run filter-branch tons of times, and the history is long.. so, it would take too much time.

Is there a faster way to filter-branch removing lots of files?

Roberto Tyley
  • 24,513
  • 11
  • 72
  • 101
caarlos0
  • 20,020
  • 27
  • 85
  • 160
  • can you consider spliting git during svn 2 git; I am basically asking for repository refactoring – forvaidya Aug 01 '13 at 12:16
  • possible duplicate of [New repo with copied history of only currently tracked files](http://stackoverflow.com/questions/17901588/new-repo-with-copied-history-of-only-currently-tracked-files) –  Aug 01 '13 at 12:21
  • I did that. But the repo still too big. My coworkers used to commit binaries to SVN, like jBoss, JDK and other things... a real mess.. – caarlos0 Aug 01 '13 at 12:22
  • @caarlos0 Did you read the answers in there about ways to use `filter-branch` to remove a lot of files? Have you tried them? (There's more than one method). Which ones did you try? Did you see any error messages or other indications of why they might have failed? –  Aug 01 '13 at 12:25
  • I've tried several ways... none worked, got errors like "file not found" and weird syntax errors... anyway... perhaps I will just wait my `for` end.. – caarlos0 Aug 01 '13 at 13:04
  • Could the problem be that your string is being expanded too early since you quoted it using double quotes? Could you file names need quoting? – jpmc26 Nov 20 '15 at 19:53

1 Answers1

7

I'd recommend using The BFG, a simpler, faster alternative to git-filter-branch specifically designed for removing unwanted files from Git history.

You mentioned in your comment that the problem files are generally big binaries, and The BFG has a specific option for handling this - you should carefully follow the BFG's usage instructions, but the core part is just this:

$ java -jar bfg.jar  --strip-blobs-bigger-than 10M  my-repo.git

Any files over 10MB in size (that aren't in your latest commit) will be removed from your Git repository's history. You can then use git gc to clean away the dead data:

$ git gc --prune=now --aggressive

The BFG is typically at least 10-720x faster than running git-filter-branch, and generally easier to use.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

Community
  • 1
  • 1
Roberto Tyley
  • 24,513
  • 11
  • 72
  • 101
  • I end up waiting... but since this is the only answer, I've checked it as correct. Thanks – caarlos0 Aug 01 '13 at 16:59
  • 1
    This is useless for a large number of very small files. Also, is `--aggressive` a good idea here? See [the woes of “git gc –aggressive” (and how git deltas work)](https://metalinguist.wordpress.com/2007/12/06/the-woes-of-git-gc-aggressive-and-how-git-deltas-work/). – jpmc26 Nov 20 '15 at 19:51
  • i didn't have access to `bfg` and so i ended up using the [github method](https://help.github.com/articles/removing-sensitive-data-from-a-repository/) which is almost identical to the original question. – Trevor Boyd Smith Jun 21 '18 at 18:54