The git local repository size and subfolder size are too different

Question

i am new to git and github, I have some questions about the size of the github repository ：

the size of my repository on github is 38M

the size of local repository directory is 42.1M

my question is why the size of all sub directory in local repository is 3.06M

I tried to solve through git gc, but did not succeed, I even wonder if this is a problem, there is no idea，Anyone know about this? Thanks.

https://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-git-repository — Josh Lee, Mar 31 '17 at 14:09
use the BFG Repo-Cleaner？I do not know if I did not use correctly or no effect — liangwei, Apr 01 '17 at 13:55

score 0 · Answer 1 · answered Mar 31 '17 at 14:15

The size of your working tree relative to the size of the database in the repo can vary for a lot of reasons.

On one hand, the repository is packed, compressed, and optimized using deltas. On the other hand, the effectiveness of those techniques varies based on the content, and the repo contains all of your history whereas your work tree only contains one version (at any given time).

Regarding gc, the main thing it does is remove unreachable objects from the repo. For example, if you work on a branch for a while, maybe add some large binary assets but later remove them; and then later you rebase the branch in such a way that the new history doesn't show the large binary asset at all... well, eventually gc should wipe out the original commits and the large binary asset, reclaiming some space.

But "unreachable" means not only that no refs can reach it, but also that the reflog can't reach it. If you want to be sure you reclaim all unused space (and you're confident that you don't need to recover anything from the reflog) then you can wipe out the reflog and then run an "aggressive" gc.

As an aside on that point... Wiping out the reflog in theory is as easy as

git reflog expire --expire=now

but I have seen this fail to remove reflog entries and I don't really know why. It's possible to remove the actual log files (e.g. using rm in bash), but be careful; if you have any stashes they are also recorded under .git/logs and you probably don't want to destroy those.

The relative size of the repo (as measured in the .git folder) to the work tree may also be affected by whether you use LFS. I'm not sure how LFS storage would be reflected in github's size reports.

If you've reclaimed all the space you can with gc, and LFS isn't a factor, then I'd say a 10:1 ratio seems surprising to me; but if you have a lot of binary files that change throughout your history, it's not beyond the realm of possibility.

And perhaps more importantly, if you add a large file and later delete it, that's _still in the history_ even if you cannot remember why. — Josh Lee, Mar 31 '17 at 14:22
Per @JoshLee: if the object is referenced in reachable history then what he says is true. (If not see my comments about `gc` above.) In the event of a large asset in reachable history, if it's causing a problem you can use the BFG repo cleaner (or `git --filter-branch` if you're patient and up for a challenge) to either remove it or convert it into an LFS object. — Mark Adelsberger, Mar 31 '17 at 14:41
Thank you for the detailed explanation, this repo for a lot of files to delete and add operation, the file may be more, but not too big file, I tried the following command : `$ git gc --prune=now --aggressive `, unlucky, no effect — liangwei, Apr 01 '17 at 13:47

The git local repository size and subfolder size are too different

1 Answers1