3

We have a certain binary file in our git repository. Usually it's around 2MB in size.

One of our developers accidentally committed this file bundled with all of its dependencies, which bumped up the file to around 40MB.

Of course we committed a fixed version, but the main repository still has that useless chunk of 40MB of binary data we do not need. I can guarantee we will never need that file's history for that specific commit (or for any other commit for that matter - it's a compiled binary, we have the source versioned anyway).

How can I remove that blob of data to restore the repo size? A simple git gc doesn't suffice, and I think I need some lower-level hacking I am not familiar with.

Yuval Adam
  • 161,610
  • 92
  • 305
  • 395
  • Yes. Obviously the disk space we could care less about. But this repo needs to be deployed to remote servers. We can't have that 40MB overhead. – Yuval Adam Jul 17 '11 at 15:32
  • @Yuval, you're always deploying the whole repo? Why? Wouldn't it be better if you either deployed just the current version or use `git pull` to deploy just the changes (this would mean transferring those 40MB *once*)? – svick Jul 17 '11 at 15:38
  • Even so, this is useful to know - and will keep the overall size of the repo down if done religiously. 40MB here, 40MB there, will easily add to a few GB's. – Arafangion Jul 17 '11 at 15:45

2 Answers2

5

If you can create the file from the source code, it most likely doesn't belong to the repository at all.

If you want to remove that version of the file from the repository, you would have to rebase the repo, ideally using git rebase -i. Problem with that is that it's rewriting history and you really shouldn't do that for commits that are already public (that is, shared between multiple users). See Recovering from upstream rebase for how to make this work if you really want to.

After you do that rebase, the file will stay in the repository for a while, but it will be removed automatically eventually. And it won't be transmitted at all, if you use git clone or git pull.

svick
  • 236,525
  • 50
  • 385
  • 514
  • I think the other answer (with its comments) leaves it pretty unclear that this requires history rewriting. You must make it as if you never committed that version of the file in the first place. (I'm setting judgment about whether the file should be committed at all aside here.) – Cascabel Jul 18 '11 at 00:21
0

If you checkout then the file will arrive in your local copy of the repo. then use git rm to get it out. Or, to make it look like it was never added check this out: Completely remove file from all Git repository commit history

Community
  • 1
  • 1
Sheena
  • 15,590
  • 14
  • 75
  • 113
  • No can do, this file can't be removed from the repo – Yuval Adam Jul 17 '11 at 15:34
  • Yuval: You either want it removed from the repo - or you don't. CHOOSE! – Arafangion Jul 17 '11 at 15:46
  • (Incidentally, you could check out a prior copy instead.) – Arafangion Jul 17 '11 at 15:59
  • @Arafangion - I want to remove a certain blob of binary data, not the entire file. Yes, this is a weird low-level operation, but one that I am sure is possible in git. – Yuval Adam Jul 17 '11 at 16:15
  • 2
    @Yuval: The trick is to realise that git does not distinguish. Your "file" is in no way related to that blob except that it shares the same sha1. If you remove all references to that blob, then as far as git is concerned, it does not exist (anymore). If you change the file, you will have a new blob. The previous change will refer to the previous blob, the new change will refer to the new blob. – Arafangion Jul 17 '11 at 16:17