1

I have a huge git repository (810mb) with large files that should not be there: complete JRE archives for distribution, located in the folder build/java.

I am trying to remove those files, so I ran:

 git filter-branch --tree-filter 'rm -rf build/java' HEAD

I now see the message: Your branch and 'origin/develop' have diverged, and have 414 and 414 different commits each, respectively. (use "git pull" to merge the remote branch into yours)

I don't want to run git pull, but before I push to the remote repository on github I want to see that the repository has shrunk.

Unfortunately, I still see it as 810mb.

What am I doing wrong? How can I shrink that repository?

TIA!

isapir
  • 21,295
  • 13
  • 115
  • 116
  • 2
    I thoroughly recommend BFG: https://rtyley.github.io/bfg-repo-cleaner/. – Oliver Charlesworth Jun 20 '15 at 23:12
  • possible duplicate of [How to remove/delete a large file from commit history in Git repository?](http://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-git-repository) – Andrew C Jun 21 '15 at 06:04
  • @AndrewC - no, this is after I followed the recommendations at the question you cited, so this is maybe a follow-up question, but not a duplicate. please do not down-vote it. – isapir Jun 21 '15 at 07:18
  • Perhaps you should have said that in your question to help others to understand your problem... – Philippe Jun 21 '15 at 08:23
  • @Philippe but my question is a separate issue. the previous steps are irrelevant to the question I have. the only thing my question has in common with the previous questions is a similar subject line. – isapir Jun 21 '15 at 08:45
  • Note: You are rewriting your repository in order to do this, including moving things like "master" forcefully. You may want to create a new upstream repository for the result and keep the old as a backup. – Thorbjørn Ravn Andersen Jun 21 '15 at 10:36
  • @lgal - it's not that your question is a duplicate of that specific question, it's that your question is a duplicate of that question and 100 other similar questions about filter branch. Also, it is covered in the `git filter-branch` documentation. See http://git-scm.com/docs/git-filter-branch : "Checklist for shrinking a repository". – Andrew C Jun 21 '15 at 16:08

2 Answers2

1

First, I highly recommend to use 'bfg repo cleaner' to remove big files from your repository.

Second, as you use github, you should know that you can use a new feature to handle some type of files that can be huge: git lfs

Unfortunately, I still see it as 810mb

Indeed, when you use filter-branch, git create a saved of all updated references during the operation under the prefix original. Until you have not accepted your changes by deleting these references AND done a garbage collection, all the objects are still in the git 'database' and the size stay the same!

Philippe
  • 28,207
  • 6
  • 54
  • 78
0

Execute

git reflog

To see a history of all commits you were at, at the top of your branch, for the last 30 days (the default retention period). Even though you rebased your branch, the commits on your old branch are still in git's reflog history, and this prevents their parent commits from being purged, together with any files they reference.

So, if some of the unwanted files are still anywhere in the history of any of those archived commits, this will effectively prevent git from purging the commits with the unwanted files.

In order to make sure that you've purged those files from the repository you must:

1) Delete your entire reflog history

git reflog expire --all

2) Figure out if any tag or branch still has any of the unwanted files in its history, and figure out what to do about it. Either delete the branch/tag, or also filter them out.

3) Run git gc to do garbage collection.

This should finally remove all the dropped files from your local git repository.

Here's the bad news: when you finally push the clean branch, pretty sure this won't guarantee that the unwanted files will also get dropped from your github repo. All you're doing is pushing the commits in your branch out. This won't, necessarily, cause the remote git repo to get garbage-collected. I am not familiar with github's default configuration, when it comes to garbage-collecting their repos. You will have to investigate that.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148