-1

I'm really new to using git, and made the mistake to also push my (big) data file (on big .RData file) to my online repository on gitlab. Now my maximum size limit is reached and I can't do any more pushes. So I would like to remove the data file. I found git's filter-branch command. However the problem is: In the very early commits the file was called datafile_early.RData, then after a few commits that file got deleted and replaced by datafile_later.RData (I'm also working with others on that repository).

So how do I purge the datafile_early.RData from the history? I tried: git filter-branch -f --tree-filter 'rm datafile_early.RData', it started removing it from the first commits but failed beacuase of the later commits it could not find the file anymore.

Rewrite a9c05c45dd0c2dacb7ba79cf829fb76a3fb70da3 (4/22) (22 seconds passed, remaining 99 predicted)  rm: datafile_early.RData: No such file or directory
tree filter failed: rm datafile_early.RData

What other options do I have?

Pweide
  • 3
  • 1
  • How about writing your filter script in such a way that it won't fail if the file is missing? Either by explicitly checking if the file is present before you try to delete it, or by somehow making your script return a zero exit code regardless. Perhaps some clever use of a wildcard works, because I believe `rm` will only fail if you ask it to remove a specific file that is missing, but not if you ask it to remove all files matching a pattern, and none matched. – Lasse V. Karlsen Mar 23 '20 at 12:35
  • Does this answer your question? [How can I delete a file from a Git repository?](https://stackoverflow.com/questions/2047465/how-can-i-delete-a-file-from-a-git-repository) – Ofek Hod Mar 23 '20 at 14:52
  • Thank you for your replies. The link linked another page that would have anwered my question: https://stackoverflow.com/questions/872565/remove-sensitive-files-and-their-commits-from-git-history Just came across it. But thanks to @torek 's answer I could already solve the problem. – Pweide Mar 24 '20 at 16:01

1 Answers1

1

If using git filter-branch:

  • --tree-filter is very slow; use --index-filter if at all possible.
  • Set up each filter so that it does not report a failure status.

The second point is the one Lasse V. Karlsen mentioned in a comment: you'd probably want your tree filter command to read rm -f datafile_early.RData datafile_later.RData to remove whichever of these files exist, and then succeed even if it removed nothing.

To address the first point, note that a tree filter consisting of rm commands can be replaced with an index filter consisting of git rm --cached commands. In this case the appropriate matching command would be:

git rm --cached --ignore-unmatch datafile_early.RData datafile_later.RData

The entire git filter-branch command is therefore probably:

git filter-branch \
  --index-filter \
  'git rm --cached --ignore-unmatch datafile_early.RData datafile_later.RData' \
  --tag-name-filter cat -- --all

(optionally, remove the backslash-newline sequences to make this all one line) which should run in considerably less time than the --tree-filter variant.

torek
  • 448,244
  • 59
  • 642
  • 775