0

I am using this script from here to list large blobs in my Git repository:

#!/bin/bash
#set -x 

# Shows you the largest objects in your repo's pack file.
# Written for osx.
#
# @see https://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
# @author Antony Stubbs

# set the internal field spereator to line break, so that we can iterate easily over the verify-pack output
IFS=$'\n';

# list all objects including their size, sort by size, take top 10
objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`

echo "All sizes are in kB's. The pack column is the size of the object, compressed, inside the pack file."

output="size,pack,SHA,location"
for y in $objects
do
    # extract the size in bytes
    size=$((`echo $y | cut -f 5 -d ' '`/1024))
    # extract the compressed size in bytes
    compressedSize=$((`echo $y | cut -f 6 -d ' '`/1024))
    # extract the SHA
    sha=`echo $y | cut -f 1 -d ' '`
    # find the objects location in the repository tree
    other=`git rev-list --all --objects | grep $sha`
    #lineBreak=`echo -e "\n"`
    output="${output}\n${size},${compressedSize},${other}"
done

echo -e $output | column -t -s ', '

I am a little puzzled about this line:

objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`

Why do a grep -v chain (where -v is invert match)? As a result you would both get commit, blob and tree objects. But are the binaries not always stored in a blob object? Meaning that for locating large binaries you should simply do: grep blob instead?

I don't see the purpose of including tree and commit objects in the result set.

so12345
  • 565
  • 1
  • 5
  • 12

1 Answers1

0

The grep -v chain tosses out lines like these:

chain length = 1: 44 objects
chain length = 2: 30 objects
chain length = 3: 15 objects
chain length = 4: 11 objects

which, indeed, seems a bit pointless since their numeric value for field 3 (-k3nr) will be the numeric value of the string "=", i.e., zero.

But are the binaries not always stored in a blob object? Meaning that for locating large binaries you should simply do: grep blob instead?

Sure. Or leave out all greps and run it on everything, including the final line(s) of the form:

.git/objects/pack/pack-6a0a97d0239b29f4fef82f52b326317cd0cdd94f.pack: ok

It's not likely that a tree or commit or tag object will make the top ten, and if it does, it might be interesting to see.

torek
  • 448,244
  • 59
  • 642
  • 775