Git transfers objects. They are not the checked-out files we see in a repository. So the fetch/push progress does not print what files it transfers. The file contents are stored in blobs. A blob that maps to a big binary is also big. To find out the big blobs, we can use
git cat-file --batch-check --batch-all-objects | grep blob | sort -n -k3
The git cat-file
and grep
list the blobs with sha1 values and sizes in bytes. The sort
sorts them in ascending order. The large ones lay in the tail.
cb7e08a255a217dffa8f317179a7596b52488e15 blob 223996333
22491bcece9c1d3f7b4b9e07b9b14ab2740d3756 blob 224114333
c6d26a1f8394881824cae82174008b34720beb5d blob 225221333
To find which commit introduces the big blobs, taking c6d26a1f8394881824cae82174008b34720beb5d
for example, we can try
git log --all --find-object=c6d26a1f8394881824cae82174008b34720beb5d
It prints the commit. Sometimes it prints nothing. Suppose the commit is abc123
. To find the file that maps to the blob,
git ls-tree -r abc123 | grep c6d26a1f8394881824cae82174008b34720beb5d
It prints the sha1 value and the file path.
This method inspects the blobs in the whole history. We could also focus on a commit, for example the head of a branch. First, checkout the commit or the branch,
git checkout foo
Then check if globstar
is on,
shopt | grep globstar
If it prints off, then use shopt -s globstar
to enable it. And the final step,
du -s **/*.* | sort -n -k1
It prints the size and path of each checked-out file in ascending order. Note that if a file does not have an extension, it is not included in the output.
With the paths and sizes, you could find out the big binaries.