0

I'm working with a repo that includes generated PDFs. Obviously this isn't a great idea, so we'd like to remove them from our repository history.1 I've tested out BFG Repo-Cleaner and the results are excellent. It's now 12 times faster to clone my fork on my machine.2 But there's one problem that's holding me back from making the change today:3

At this point, you're ready for everyone to ditch their old copies of the repo and do fresh clones of the nice, new pristine data. It's best to delete all old clones, as they'll have dirty history that you don't want to risk pushing back into your newly cleaned repo.

In theory we can tell everyone in our organization to remove their copies of the repo, but there are surely people outside of the organization who have copies that we don't know about. (For that matter, I probably have copies I don't remember making.) So we want to prevent people from pushing PDF file history after it's been cleaned out.

One solution might be some sort of pre-push hook that blocks pushes when Git history still includes PDFs after we remove that history. But what should we check for? Is there some other way to avoid getting all this history back from someone who hasn't heard that they should re-clone the repository?


Footnotes:

  1. At the moment we're moving them to LFS, but I'd like to get to a point were we aren't tracking generated binary files at all.

  2. Yeah. This is why tracking large files in Git isn't a great option. Even worse when a new copy is created with each push.

  3. Other than it's not a good idea to make a big change over the weekend.

Jon 'links in bio' Ericson
  • 20,880
  • 12
  • 98
  • 148
  • Note that setting up a pre-push hook is an active thing. That is, it's something where, upon stumbling over some existing repository that doesn't have a pre-push hook yet, you have to *notice* this and go: hey, wait, I'd like to add a pre-push hook before I accidentally push commits I don't want to push. You're approximately equally likely to remember not to push as you are to add the pre-push hook. – torek Jul 25 '21 at 01:40
  • @torek: Right. Essentially [I can't force people to install a local hook](https://stackoverflow.com/questions/3703159/git-remote-shared-pre-commit-hook/3703207#3703207) anymore than I can force them to re-clone the central repository. – Jon 'links in bio' Ericson Jul 25 '21 at 21:37

1 Answers1

0

To start, it's not really that hard remove files from Git history. Worst case, re-run the BFG process. Best case you do nothing and everyone either follows the instructions or doesn't try to push dirty history.

Ideally use a pre-recieve hook to block bad pushes on the server side. It appears this is a possibility with GitHub Enterprise. If that's not an option, using a Husky pre-push hook should do the trick. My approach is to look for a particular commit that we know should no longer exist.

#!/bin/sh

git show 5c11439d7ade68daa9a3cb72271814ea8575e4f4 -s
if [ $? = 0 ]; then
    echo "Looks like you have a copy of the repository with a bad commit."
    echo "Please save your work to a temporary location and delete this repository."
    echo "If you create a fresh clone, that should fix this problem."
    exit 1
fi

It doesn't much matter which commit you check as long as it's one that will be removed by the BFG step. When you remove the PDFs from Git history, immediately add the hook (either pre-recieve or a Husky pre-push) to make sure new pushes don't include the unwanted commit.

I believe it will also be necessary to push the Husky hook to all branches to make sure everyone has it. This should not be necessary with the server-side pre-recieve hook.

Jon 'links in bio' Ericson
  • 20,880
  • 12
  • 98
  • 148