0

I have a file "npz, species_coex.npz", that was added to my git repository by mistake. After realizing my mistake I removed it with git rm. Now I found out, that git still knows about it (which is usually fine, but I want git to forget completely about it, as if it was never added in the first place).

I've read about the filter-branch command, but would like to not use it, because of all the warnings about it, if this is not possible tell me.

I've read this, they recommend:

$ git filter-branch --tree-filter 'rm -f "npz, species_coex.npz" ' HEAD

I get the error:

fatal: ambiguous argument 'npz, species_coex.npz': unknown revision or path not in the working tree

I'm not sure, why this problem occurs, because of the blank space (which I guess not, as I put it into quotations) or because the file is not in the current head? How can I tell him where this file is to be found?

And is there a way, how I can do this without a filter branch? I only added the file once and then removed it, so it's history is quite simple

Community
  • 1
  • 1
Jürg W. Spaak
  • 2,057
  • 1
  • 15
  • 34
  • 1
    The `--tree-filter` is extremely slow but the quoted recommendation should work. It appears that you actually used `git rm` and not just `rm` in your `filter-branch`. For a much faster way to remove unwanted files, consider using "the BFG" (search for that name); but note that it has the same problem as any filter-branch: you must copy the repository, or at least all parts at and after the "bad" file, to new commits that are no longer compatible with the original commits, and *everyone* must then change to accommodate this. – torek May 18 '17 at 08:58

1 Answers1

3

The "problem" with filter-branch is the same as with any command that modifies the history of already pushed commits. If someone else already got this commit and has a branch based on it, he will have to manually fix his history (i. e. every other one manually) like described in the help of git rebase under the heading RECOVERING FROM UPSTREAM REBASE.

If you want to purge the file from the history, because it e. g. contains confidential information like passwords, you have no other chance than to modify the history, no matter which tool you use for this, be it git rebase -i, git filter-branch or the tool called BFG.

With filter-branch you should not use the --tree-filter, as it needs a full worktree for each commit. This is necessary if you want to add or change some files. If it is only about deleting files, you should use the --index-filter instead and only operate on the index rather than on the worktree that will not be available. Your filter command will then be something like --index-filter 'git rm --cached --ignore-unmatch "npz, species_coex.npz"'.

The error you got with your try implies that you did not use rm ... but git rm ... in your filter command, but without the --ignore-unmatch which tells git to ignore it if you try to delete a non-existing file, similar to what -f amongst other things does for the normal rm utility.

If you added the file some few commits back, it might be easier and faster to use an interactive rebase though. Just do git rebase -i <the commit before the one that added the file>, then in the editor change the pick stanza to edit for the commit that added the file and quit the editor. When Git stops, delete the file from the current commit like git rm 'npz, species_coex.npz' && git commit --amend -C HEAD and continue the rebasing with git rebase --continue. After Git is finished then, you should have a new version of your history without the file.

Vampire
  • 35,631
  • 4
  • 76
  • 102
  • Thanks for the help, but nothing really helped: if I enter git filter-branch --index-filter 'git rm ...' he says fatal: ambiguous argument 'rm': unknown revision or path not in the working tree. So I tried with the normal rm, but then he just says usage: git filter-branch ... And explains how to use the filter branch command. The second option you use doesn't work/I don't know what to do, because it's the first of all commits, that added this file, so wasn't sure what I should checkout out. – Jürg W. Spaak May 18 '17 at 12:29
  • Thanks, now everything worked (just had to do git rebase --root) – Jürg W. Spaak May 18 '17 at 12:55
  • Yeah, if it's the first commit that should be changed, then `rebase` needs `--root` instead of an upstream commit. But `filter-branch` should have worked too. Not with `rm` of course, because in an `--index-filter` there is no worktree, so you cannot delete the file with `rm` as I explained in the answer. But with `git rm --cached --ignore-unmatch "npz, species_coex.npz"` it should have worked. I'd need to see the full command to tell you what you did wrongly. – Vampire May 18 '17 at 13:12
  • As I already solved the problem I'm not trying to reproduce the error, because I'm still afraid to lose everything ;) However I'm pretty sure that the command was: git filter-branch --index-filter 'git rm --cached --ignore-unmatch "npz, coex_species.npz" '. Because it didn't run I also tried git filter-branch --tree-filter 'git rm --cached --ignore-unmatch "npz, coex_species.npz" ' – Jürg W. Spaak May 18 '17 at 13:49
  • Hm, very strange. Works fine for me. Doesn't change anything of course as I don't have such a file, but no error occurs. – Vampire May 18 '17 at 13:58