0

I've got a git checkout that is 3.2 GB in size, including the .git dir. There is a subfolder in this repo that I'd like to split out into a new git repo. So I followed the instructions here: https://help.github.com/articles/splitting-a-subfolder-out-into-a-new-repository/

So far, so good. However, the new repo (including .git dir) is still 2 GB, but the checked out files are only 22MB and there are only 164 commits in the log on the main branch. I've tried a couple of things like git reflog expire --expire=now --all and git gc --aggressive and git prune --now.

I still see a lot of branches that I'd like to purge.

What can I do to ensure none of the removed code is in this repo?

More info:

git st shows:

# Your branch and 'origin/master' have diverged,
# and have 164 and 101729 different commits each, respectively.
#   (use "git pull" to merge the remote branch into yours)

So it looks like the history still has all of those commits.

Eddified
  • 3,085
  • 8
  • 36
  • 47
  • The *easiest* way to deal with the fact that `git filter-branch` has to *copy* everything (or everything to be preserved), which doubles (or less than doubles if you copy a lot less) the size of your repository, is to `git clone` the post-filtering repository, or rather, the interesting branch(es) of it (you can just delete the uninteresting ones, plus any unfiltered tags, and then clone). That instantly discards, from the new clone, all but the filtered results, with no messing-about with reflogs and gc and prune. – torek Mar 10 '17 at 00:10
  • I tried cloning. It didn't help. I'm not sure how to delete uninteresting branches from the index. – Eddified Mar 10 '17 at 00:12
  • To delete a branch—which has to be one that you are not standing on—use `git branch -D `. (Branches and Git's index are separate concepts, not sure what you meant there.) Check for tags as well (`git tag`) and delete any you need to have gone. – torek Mar 10 '17 at 00:21
  • Not sure if that's the case, but you might also want to try [cleaning large unused files](http://stackoverflow.com/questions/10622179/how-to-find-identify-large-files-commits-in-git-history). – Samir Aguiar Mar 10 '17 at 00:21
  • Large files aren't the issue. Deleting checked-out branches is what you've described, and I know how to do that. But checked-out branches doesn't appear to be the issue either. I have a zillion remote branches. I have figured out just now that I need to delete *remote* branches this way `git branch -d -r origin/branchname`. – Eddified Mar 10 '17 at 00:23
  • At least I should say, all large files might be the issue but all large files I don't care about should have been in the removed code.... – Eddified Mar 10 '17 at 00:28
  • Remote-tracking branches such as `origin/master` are not copied during cloning. If you're not cloning after filtering, you *do* need to remove the `refs/original/` name-space names. (The fact that you see `origin/master` in `git status` output means you haven't cloned, or have otherwise polluted your fresh new clone with the old stale pre-copy objects.) – torek Mar 10 '17 at 03:58
  • Yes, I had tried a clone, but it was still 2GB so I went back to the original (non-clone) :). – Eddified Mar 10 '17 at 13:07

1 Answers1

0

I had overlooked the fact that the git --filter-branch command only works on the given branch. So I had to do that to all branches that I wanted to keep, and delete the others. Next, I deleted all of the tags. Then, I cloned the repo locally using git clone /path/to/local/repo, but it was still 2GB. Finally, running these two commands helped me clean up all the leftover cruft:

git reflog expire --expire=now --all
git gc --prune=now

... which brought it down to 28MB.

Warning: may only apply to old versions of git, these instructions were applied using git v1.8.3.1.

Eddified
  • 3,085
  • 8
  • 36
  • 47