0

Our git repository has grown to an unwieldy size due to binary files, images, etc being accidentally committed to it. There are no large files currently in the repo, but there are large files in the history of the repo.

My plan is to remove these large files from our git history, and I have found a number of good resources and SO answers for doing so (https://rtyley.github.io/bfg-repo-cleaner/, How to remove/delete a large file from commit history in Git repository?, https://help.github.com/articles/removing-files-from-a-repository-s-history/).

My primary issue is that we have a number of contributors to our repo (hosted on Bitbucket) and I'm worried that once I rip the large files out of the history, our contributors will push the history with the large files back up into the remote repo.

Specifically, the BFG Repo Cleaner documentation states:

At this point, you're ready for everyone to ditch their old copies of the repo and do fresh clones of the nice, new pristine data. It's best to delete all old clones, as they'll have dirty history that you don't want to risk pushing back into your newly cleaned repo.

So, my question is twofold:

  1. Is there a way to ensure that pushes from old clones of the repo won't re-introduce the large files?
  2. If not, is there a way to keep old clones from pushing and thus require all contributors to start with a fresh clone?

Thank you!

1 Answers1

1

For git in general, you would use a pre-receive hook on the origin repo. When a push is received, the hook runs; and if the hook doesn't like the contents of the push, it rejects it. So you could write a script that looks for large objects, or for certain file types, or for whatever you think will most effectively enforce your requirements. See git hook documentation (https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks).

For bitbucket, from https://confluence.atlassian.com/bitbucketserverkb/how-to-create-a-simple-hook-in-bitbucket-server-779171711.html:

Bitbucket Server has two primary ways you can create a hook.

The recommended way is to create a Plugin Using our Java plugin development framework. It is also possible, although strongly discouraged, to create a server side Git hook in your Bitbucket Server instance's file system.

The page goes on to explain more about their recommended way for setting up a hook.

Community
  • 1
  • 1
Mark Adelsberger
  • 42,148
  • 4
  • 35
  • 52
  • Thank you for the idea! A couple questions: 1. I don't think this is available for Bitbucket Cloud (we don't use Bitbucket Server): https://bitbucket.org/site/master/issues/10471/git-server-side-pre-receive-hook-bb-11418. Any thoughts on that? 2. Would there be a way to set up a server side hook that checks to see if there are any large files in the _history_ (ie, not checking the HEAD, but actually checking the history)? The reason that I ask is because our worry is that people will push dirty history with the large files that we've already removed. – Dylan Fried May 25 '18 at 12:55
  • As I'm not a bitbucket user, I can't speak to the different features of cloud vs server. To your other question - yes, you can examine any element of the data in a push to decide whether to accept it, which could include history, not just the HEAD. – Mark Adelsberger May 25 '18 at 14:49
  • Gotcha, that makes a ton of sense. Thanks @mark-adelsberger! – Dylan Fried May 28 '18 at 12:11