2

I am currently in a mess where there I believe there is a big binary file being rate-limited by Github but I can't really see which file it is. Is there a way to run git pull so that it shows the file it is pulling as it progresses? Instead of just a count:

Receiving objects: 36% (67/183), 668.00 KiB | 7.00 KiB/s

UPDATE

I got a bit more verbosity by following the advice here. By doing this:

export GIT_TRACE_PACKET=1
export GIT_TRACE=1
export GIT_CURL_VERBOSE=1

But it still doesn't show the file currently being downloaded.

rockstardev
  • 13,479
  • 39
  • 164
  • 296

1 Answers1

3

Git transfers objects. They are not the checked-out files we see in a repository. So the fetch/push progress does not print what files it transfers. The file contents are stored in blobs. A blob that maps to a big binary is also big. To find out the big blobs, we can use

git cat-file --batch-check --batch-all-objects | grep blob | sort -n -k3

The git cat-file and grep list the blobs with sha1 values and sizes in bytes. The sort sorts them in ascending order. The large ones lay in the tail.

cb7e08a255a217dffa8f317179a7596b52488e15 blob 223996333
22491bcece9c1d3f7b4b9e07b9b14ab2740d3756 blob 224114333
c6d26a1f8394881824cae82174008b34720beb5d blob 225221333

To find which commit introduces the big blobs, taking c6d26a1f8394881824cae82174008b34720beb5d for example, we can try

git log --all --find-object=c6d26a1f8394881824cae82174008b34720beb5d 

It prints the commit. Sometimes it prints nothing. Suppose the commit is abc123. To find the file that maps to the blob,

git ls-tree -r abc123 | grep c6d26a1f8394881824cae82174008b34720beb5d 

It prints the sha1 value and the file path.

This method inspects the blobs in the whole history. We could also focus on a commit, for example the head of a branch. First, checkout the commit or the branch,

git checkout foo

Then check if globstar is on,

shopt | grep globstar

If it prints off, then use shopt -s globstar to enable it. And the final step,

du -s **/*.* | sort -n -k1

It prints the size and path of each checked-out file in ascending order. Note that if a file does not have an extension, it is not included in the output.

With the paths and sizes, you could find out the big binaries.

ElpieKay
  • 27,194
  • 6
  • 32
  • 53
  • 1
    As an alternative to the trick with glob options, you could use `find` to list only files, and exclude the `.git` directory specifically: `find . -name '.git' -prune -o -type f -exec du '{}' ';'` – IMSoP Jun 07 '22 at 08:20
  • @IMSoP Great! Thanks for the method. It can list the file without an extension. – ElpieKay Jun 07 '22 at 08:23
  • 1
    Note that a very large blob *might*, if it compresses well, be transferred as a very small deltified object in a pack file. Of course this isn't the situation the OP expects here. – torek Jun 07 '22 at 19:26