I'm working with a repo that includes generated PDFs. Obviously this isn't a great idea, so we'd like to remove them from our repository history.1 I've tested out BFG Repo-Cleaner and the results are excellent. It's now 12 times faster to clone my fork on my machine.2 But there's one problem that's holding me back from making the change today:3
At this point, you're ready for everyone to ditch their old copies of the repo and do fresh clones of the nice, new pristine data. It's best to delete all old clones, as they'll have dirty history that you don't want to risk pushing back into your newly cleaned repo.
In theory we can tell everyone in our organization to remove their copies of the repo, but there are surely people outside of the organization who have copies that we don't know about. (For that matter, I probably have copies I don't remember making.) So we want to prevent people from pushing PDF file history after it's been cleaned out.
One solution might be some sort of pre-push
hook that blocks pushes when Git history still includes PDFs after we remove that history. But what should we check for? Is there some other way to avoid getting all this history back from someone who hasn't heard that they should re-clone the repository?
Footnotes:
At the moment we're moving them to LFS, but I'd like to get to a point were we aren't tracking generated binary files at all.
Yeah. This is why tracking large files in Git isn't a great option. Even worse when a new copy is created with each push.
Other than it's not a good idea to make a big change over the weekend.