13

I learn "tree" and "index" from the this aritcle: Learning Git Internals by Example

but when it come to "git filter-branch" command, I don't know what is the difference between "--tree-filter" and "--index-filter".

GoTop
  • 850
  • 1
  • 9
  • 22

1 Answers1

21

The short version is that --tree-filter checks out each commit into a temporary directory, runs your filter command, and builds a new commit from whatever is now in the temporary directory; while --index-filter copies each commit into the index, runs your filter command, and builds a new commit from whatever is now in the index.

Copying a commit to the index is much1 faster than checking out the commit. Building a commit from the index is faster than building a commit from a directory. As a result, using the index filter is much faster than using the tree filter. It's not as easy to script for, though.


1The exact speed difference depends on your temporary directory: an in-memory file system is faster than an on-SSD file system which is faster than on-spinning-media, so you gain more if you're using spinning media than if you can point the tree filter to an in-memory file system. But even then the index filter is still faster.

On actual disks, I've seen about a factor of 100 or so (hence an index filter that takes 2 minutes translates to a tree filter that takes 3+ hours).

torek
  • 448,244
  • 59
  • 642
  • 775
  • Amazing extra research there @torek, can't learn this stuff by just reading the docs! – timhc22 Feb 24 '22 at 02:51
  • 1
    @timhc22: I'm approximating from some historical values here. I did a lot of repository surgery back in the mid-200x or early 201x-es. Had to use real spinning media for a while, then had a ramdisk file system to use for some of it later... – torek Feb 24 '22 at 10:54