Your theory is mostly right, just a bit off at the very end. git filter-branch
doesn't use the standard index, nor your work-tree. When it invokes each of your filters, it's running in a temporary directory whose location in the file system depends on whether you supplied a -d directory
option.1 With no usable work-tree, all index-filter git rm
operations should use --cached
.2 Filter-branch also creates a temporary index within the temporary directory. This temporary index is in place for each step of the filtering: it's used to extract the commit for the index filter, and then to make the new commit after the index-filter modifies it.
Having finished all the filtering operations, the last few steps of filter-branch
are:
Update all the to-be-modified references: the filtered branches, and any tags if you used --tag-name-filter
. For instance, master
might have identified commit a123456...
before. If new commit b789abc...
is the one that should now be used instead of a123456
, Git must rewrite refs/heads/master
so that it now identifies commit b789abc...
.
If you're in a non-bare repository (filter-branch can be used on bare repositories), effectively git checkout
the new commit instead of the old one. This updates your index and work-tree, so that your index and work-tree remain "clean".3 The actual operation is a git read-tree
:
if [ "$(is_bare_repository)" = false ]; then
git read-tree -u -m HEAD || exit
fi
It's this last step, the effective-check-out, that removes your *.ext
files.
Your current index shows them as tracked, which they are; they're in your current commit, which is (say) a123456...
. Your new target commit b789abc...
lacks the *.ext
files, because you filtered them away with your --index-filter
. So to move from the current commit to the new commit, the correct operation is to remove the files.
To fix the problem:
Run git rm --cached '*.ext'
before you run the filter-branch command with the --index-filter
option, and commit. That way your current commit (which will no longer be a123456...
) won't have the files, and your index won't have the files, so they'll just be untracked files in both the current commit and the new one and will be untouched.
1Filter-branch starts by setting $tempdir
from your -d
option, or the default, which is .git-rewrite
. It then creates $tempdir/t
, creating $tempdir
too if needed; changes location into it; and sets the Git work-tree to .
. This $tempdir/t
directory remains empty unless you use the --tree-filter
option or, in any of your filters, attempt to manipulate the current directory in some way.
2Without --cached
, git rm
tries to remove files from $GIT_WORK_TREE
, i.e., $tempdir/t
. Filter-branch's own state files are in ../
from here, i.e., in $tempdir
, so they should be safe in general, but it's still not a great idea to try to remove things that shouldn't be there anyway: if nothing else, it's wasted compute time. Using --cached
keeps git rm
from even attempting it.
3"Clean" here is a vaguely-defined state that you probably recognize anyway. The precise definition, if you want it, is in git-sh-setup.sh
, specifically in the function require_clean_work_tree
. Filter-branch invokes this before beginning the filtering, if run on a non-bare repository.