0

I stupidly checked in a 200+M useless file into my repo one month ago, now all my co-worker (internationally) has the large file now. I am wondering if there is a good way of purging this large file for everyone transparently using git hooks.

I can use either bfg or git filter-branch in the git hook to remove the big file, but it seems that a force push is required afterward, which is risky to the repo. Have anyone here done this before, what's the entire workflow and configuration looks like ?

Is there better idea than using git hook?

Thanks in advance!

fast tooth
  • 2,317
  • 4
  • 25
  • 34
  • Possible duplicate of [How to remove/delete a large file from commit history in Git repository?](http://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-git-repository) – René Höhle Jul 20 '16 at 14:13
  • @stony i know how to use bfg or filter-branch to remove file in my local git, I am asking how to purge the file EVERYWHERE without collaborative effort from everyone. – fast tooth Jul 20 '16 at 15:41
  • It isn't really nebulously "risky to the repo." It's just that rewriting history causes all sorts of problems if it's been pulled into other repos. – jpmc26 Jul 20 '16 at 16:54

3 Answers3

2

The simple answer is no, there is no way to rewrite history without a collaborative effort from everyone. Server side hooks will not modify their local clones. Regardless of whether you could automate anything with client side hooks, you shouldn't. You would have to handle every possible case of updating unpushed local changes and extra branches and so on.

The bottom line is everyone will have to update their local repository to an entirely new tree (at least new starting with the commit where the file was introduced and everything following). A 200 MB file is annoying, but it's probably not as annoying as everyone having to rewrite their local history to remove it. If you can't individually walk each team member through it, there's no security ramification, and it's not actually causing blocking problems (like the repo now exceeds your host's size limit and prevents more pushes), you should probably just commit a file delete (so it won't check out on disk anymore) and leave the history alone.

If the commit that introduced it is very recent, you might consider creating a new branch, but only if you can get everyone to switch over to it seamlessly. That doesn't sound like the case from your question.

Another alternative would be if you can get everyone to clone a new repo after modifying the history. But this would require everyone to port their changes over to a new copy of the repo.

jpmc26
  • 28,463
  • 14
  • 94
  • 146
  • yes, i could leave it alone and maybe i should. but I had enough being dubbed as the guy who checked in the 200MB file ... Thanks for this answer, you analysis is accurate, unless someone has a solution, i will choose yours as the answer. – fast tooth Jul 20 '16 at 21:30
  • @fasttooth Everyone makes mistakes, and it sounds like you've learned a bit of a lesson. Hopefully, your team will have a short laugh and move on. If so, laugh about it yourself; crack a joke at your expense and wait for it to pass. If it keeps up, you can head over to [Workplace.se] to find out how to address that problem. =) – jpmc26 Jul 20 '16 at 21:38
  • thanks jpmc26, my coworkers are cool about it. Just friendly mocking – fast tooth Jul 21 '16 at 01:58
1

Changing history comes with great deal of pain, here are a few ways you could delete the file from your history using git filter-branch

Note: Everyone will need to update their work to reflect your revised changes

git filter-branch --tree-filter "rm -f yourfilename.ext" -- --all

In the above replace yourfilename.ext with your file like tutorial.mp4 What this will do is besically go through each commit in your repo and delete the file yourfilename.ext.

-- --all will make sure the applied command will go through all branches.

Alternatively, you can run a similar command but with --index-filter instead. What it does is besically instead of checking each commit through working directory, it will do the heavy lifting in staging area (just executing the command without checking the content of the commit). This method can be faster.

git filter-branch --index-filter "git rm --cached --ignore-unmatch filename.ext" -- --all

As previously stated, just make sure to replace filename.ext with your filename + extension.

Hope it helps

e.doroskevic
  • 2,129
  • 18
  • 25
  • Hi, this definitely helps. I know how to use filter-branch or bfg to purge a file. I am more asking for how to purge the file from everyone's copy transparently. Some of the users are not comfortable with git yet, and all it takes is one person's merge to bring back the file back to the repo. – fast tooth Jul 20 '16 at 15:43
  • @fasttooth that's the problem with changing history overall, everyone will need to besically update *clone* your revised reposition and abonded the later version – e.doroskevic Jul 20 '16 at 15:50
1

If you want to go the route of the hook, there are server side hooks described at https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks. You might try to tap into these to examine the commits pushed and verify that the old large file is not present in there. Look at the .git/hooks/update.sample file. Chacon has a good writeup in his book as well, see https://git-scm.com/book/en/v2/Customizing-Git-An-Example-Git-Enforced-Policy.

Given what everyone else is saying, it seems like you can't guarantee that people don't screw you up and repush the file (or some other large file) and so a hook would be the only insurance against this getting back into your repo.

David Neiss
  • 8,161
  • 2
  • 20
  • 21