1

we are working on the following issue:

We are working with data which is about 1.2 GB. Because we switched to another data format we forgot to put the new format into the .gitignore. After we noticed that all the files were added we removed them all and committed, but it was too late and now we are not able to push anymore to the server because its too big for the bandwidth!

we already tried

sudo git filter-branch --tree-filter "rm -f *.nc" HEAD 

But this is not working! The push still tries to push all the ~3000 objects! So how can we proceed because we are really in trouble since its not possible to communicate with each other.

suspectus
  • 16,548
  • 8
  • 49
  • 57
varantir
  • 6,624
  • 6
  • 36
  • 57
  • Linking [this](http://stackoverflow.com/a/8741530/113848) and [this](http://stackoverflow.com/a/3459399/113848) for reference, though there seems to be something else happening here. – legoscia Sep 25 '14 at 13:19
  • Can you make an ascii drawing for your commit history and the problem? then it would be much easier to understand the problem and to answer. – ryenus Sep 25 '14 at 13:19
  • can't you just make a commit with removing that file? `git rm BIGFILE` and push it? – IProblemFactory Sep 25 '14 at 13:35
  • @ProblemFactory That leaves BIGFILE in older commits, saving space in a checkout of the newer commits but still occupying space in the repository itself. – chepner Sep 25 '14 at 15:47

2 Answers2

3

Yes, git filter-branch is a way to go.

But since you're going to change the history, you can just squeeze out the problematic part of the history, then cherry-pick the later good commits, just like I recently explained here.

Then you can you git gc --prune to remove the useless huge commits.

and use git push -f to overwrite the history on the server side.

Community
  • 1
  • 1
ryenus
  • 15,711
  • 5
  • 56
  • 63
  • based on his description this sounds likely to lose data - that would only work if the bad commits added *.nc files and nothing else, which is probably not the case. Additionally you would need to trigger gc on the server. – Andrew C Sep 25 '14 at 14:50
  • @AndrewC, I think this depends on which part of the history the OP would choose to rework and how the 'squeeze` is done, if only the wrongly added files are left out, then everything is good. Also the user can create tags or rely on `git reflog` to go back and restart the whole process. – ryenus Sep 26 '14 at 05:08
  • It depends on whether or not people *only* committed build output. Normally people don't do that. Sure, it can happen, but it's not typical. – Andrew C Sep 26 '14 at 18:20
0

First decide how many problematic commits you are dealing with. Filter-branch is powerful, but it is also confusing to use and has bizarre syntax. For me, if the number of problematic commits is <10 I would use rebase, if it's >10 I would use filter branch.

For a filter-branch solution you would normally use the --index-filter form. You would use *.nc in place of filename. But you might need to also add -r for recursive if your nc files are spread out, and you might need to add --prune-empty as well.

git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD

For a minimal number of commits git rebase -i HEAD~X would be simpler. Just change pick to edit, go back, and cleanup the commits to remove the bad files and add the .gitignore in place.

Once you do this - you will have fixed the revision history. You can't garbage collect just yet though.

If you used filter branch it created a bunch of backup refs. You need to delete them with

git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d

For either a filter or a rebase solution you will also need to expire the reflogs

git reflog expire --expire=now --all

Now you can finally reclaim the disk space the objects are taking up with

git gc --prune=now

That would 'fix' whatever repo you are currently working on. If that isn't the repo on your server then you would need to force push up to the server. That would only fix the refs on the server though, it might not reclaim any disk space. You'd need to expire/gc on the server too.

Andrew C
  • 13,845
  • 6
  • 50
  • 57