The size of your working tree relative to the size of the database in the repo can vary for a lot of reasons.
On one hand, the repository is packed, compressed, and optimized using deltas. On the other hand, the effectiveness of those techniques varies based on the content, and the repo contains all of your history whereas your work tree only contains one version (at any given time).
Regarding gc
, the main thing it does is remove unreachable objects from the repo. For example, if you work on a branch for a while, maybe add some large binary assets but later remove them; and then later you rebase the branch in such a way that the new history doesn't show the large binary asset at all... well, eventually gc
should wipe out the original commits and the large binary asset, reclaiming some space.
But "unreachable" means not only that no refs can reach it, but also that the reflog can't reach it. If you want to be sure you reclaim all unused space (and you're confident that you don't need to recover anything from the reflog) then you can wipe out the reflog and then run an "aggressive" gc.
As an aside on that point... Wiping out the reflog in theory is as easy as
git reflog expire --expire=now
but I have seen this fail to remove reflog entries and I don't really know why. It's possible to remove the actual log files (e.g. using rm
in bash), but be careful; if you have any stashes they are also recorded under .git/logs and you probably don't want to destroy those.
The relative size of the repo (as measured in the .git folder) to the work tree may also be affected by whether you use LFS. I'm not sure how LFS storage would be reflected in github's size reports.
If you've reclaimed all the space you can with gc
, and LFS isn't a factor, then I'd say a 10:1 ratio seems surprising to me; but if you have a lot of binary files that change throughout your history, it's not beyond the realm of possibility.