0

Question

What difference does the --cached flag make when running git rm via git filter-branch --index-filter?

git filter-branch --index-filter 'git rm --cached *.ext'  
git filter-branch --index-filter 'git rm *.ext'

Issue

When I run git rm --cached *.ext, all *.ext files are un-tracked but not removed from working directory (this is the purpose of --cached flag as I understand it)

But, when I try to run that command retroactively like

git filter-branch --index-filter 'git rm --cached *.ext'

all historical *.ext files are un-tracked AND existing *.ext files ARE removed from working directory.

Why?

What's the difference without --cached

Theory

Since the command is being run on all commits, with no "working directory" per run, it is also treating HEAD/current working directory as a standalone commit without a working directory.

goofology
  • 914
  • 2
  • 10
  • 21

1 Answers1

3

Your theory is mostly right, just a bit off at the very end. git filter-branch doesn't use the standard index, nor your work-tree. When it invokes each of your filters, it's running in a temporary directory whose location in the file system depends on whether you supplied a -d directory option.1 With no usable work-tree, all index-filter git rm operations should use --cached.2 Filter-branch also creates a temporary index within the temporary directory. This temporary index is in place for each step of the filtering: it's used to extract the commit for the index filter, and then to make the new commit after the index-filter modifies it.

Having finished all the filtering operations, the last few steps of filter-branch are:

  1. Update all the to-be-modified references: the filtered branches, and any tags if you used --tag-name-filter. For instance, master might have identified commit a123456... before. If new commit b789abc... is the one that should now be used instead of a123456, Git must rewrite refs/heads/master so that it now identifies commit b789abc....

  2. If you're in a non-bare repository (filter-branch can be used on bare repositories), effectively git checkout the new commit instead of the old one. This updates your index and work-tree, so that your index and work-tree remain "clean".3 The actual operation is a git read-tree:

    if [ "$(is_bare_repository)" = false ]; then
            git read-tree -u -m HEAD || exit
    fi
    

It's this last step, the effective-check-out, that removes your *.ext files.

Your current index shows them as tracked, which they are; they're in your current commit, which is (say) a123456.... Your new target commit b789abc... lacks the *.ext files, because you filtered them away with your --index-filter. So to move from the current commit to the new commit, the correct operation is to remove the files.

To fix the problem:

Run git rm --cached '*.ext' before you run the filter-branch command with the --index-filter option, and commit. That way your current commit (which will no longer be a123456...) won't have the files, and your index won't have the files, so they'll just be untracked files in both the current commit and the new one and will be untouched.


1Filter-branch starts by setting $tempdir from your -d option, or the default, which is .git-rewrite. It then creates $tempdir/t, creating $tempdir too if needed; changes location into it; and sets the Git work-tree to .. This $tempdir/t directory remains empty unless you use the --tree-filter option or, in any of your filters, attempt to manipulate the current directory in some way.

2Without --cached, git rm tries to remove files from $GIT_WORK_TREE, i.e., $tempdir/t. Filter-branch's own state files are in ../ from here, i.e., in $tempdir, so they should be safe in general, but it's still not a great idea to try to remove things that shouldn't be there anyway: if nothing else, it's wasted compute time. Using --cached keeps git rm from even attempting it.

3"Clean" here is a vaguely-defined state that you probably recognize anyway. The precise definition, if you want it, is in git-sh-setup.sh, specifically in the function require_clean_work_tree. Filter-branch invokes this before beginning the filtering, if run on a non-bare repository.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Thank you for this excellent answer! It seems, that in my case, (I don't want an additional "commit" to remove historical files, but I also want to keep the files in working tree), I should wait until I'm ready to make another substantive commit, run `git rm --cached '*.ext'` before finally committing, and then run the `git filter-tree --index-filter...` command? – goofology Aug 08 '19 at 23:43
  • 1
    You *can* wait but that makes it easy to forget. I'd suggest instead: Go ahead and make the extra commit. Do the filter-branch and make sure you're happy with the results. Then, remove the extra commit you made (and that Git copied) with `git reset --hard HEAD^`, if Git copied it. Note that `--filter-branch` has a `--prune-empty` option that skips making a commit if it matches a previous one, which is useful for these particular cases as long as you don't have deliberate "empty" commits that you want to preserve. – torek Aug 09 '19 at 00:05
  • Does tree-filter behavior differ with regards to --cached? It seems the flag doesn't accomplish anything in that case. – goofology Aug 12 '19 at 19:12
  • [Here is another question I've asked](https://stackoverflow.com/q/57418769), if you're interested in taking a look. – goofology Aug 12 '19 at 19:17
  • 1
    `tree-filter` is very different, because it throws away any changes you make to the index and builds a new commit strictly from whatever you leave in the temporary directory. – torek Aug 12 '19 at 20:01
  • I saw that earlier, and really don't want to wade into that. I think you'll find that this gets messy no matter how you do it: it turns out people don't always want their `.gitignore` files being obeyed. – torek Aug 12 '19 at 20:27
  • understood, thanks VERY MUCH, you helped me come up with my answer to that q, specifically committing immediately after `rm -cached`, and then a `--prune-empty` on the following `git filter-branch` command. Incredibly helpful! – goofology Aug 16 '19 at 03:05