89

Is it possible to show the total file size difference between two commits? Something like:

$ git file-size-diff 7f3219 bad418 # I wish this worked :)
-1234 bytes

I’ve tried:

$ git diff --patch-with-stat

And that shows the file size difference for each binary file in the diff — but not for text files, and not the total file size difference.

Any ideas?

Mathias Bynens
  • 144,855
  • 52
  • 216
  • 248

7 Answers7

109

git cat-file -s will output the size in bytes of an object in git. git diff-tree can tell you the differences between one tree and another.

Putting this together into a script called git-file-size-diff located somewhere on your PATH will give you the ability to call git file-size-diff <tree-ish> <tree-ish>. We can try something like the following:

#!/bin/bash
USAGE='[--cached] [<rev-list-options>...]

Show file size changes between two commits or the index and a commit.'

SUBDIRECTORY_OK=1
. "$(git --exec-path)/git-sh-setup"
args=$(git rev-parse --sq "$@")
[ -n "$args" ] || usage
cmd="diff-tree -r"
[[ $args =~ "--cached" ]] && cmd="diff-index"
eval "git $cmd $args" | {
  total=0
  while read A B C D M P
  do
    case $M in
      M) bytes=$(( $(git cat-file -s $D) - $(git cat-file -s $C) )) ;;
      A) bytes=$(git cat-file -s $D) ;;
      D) bytes=-$(git cat-file -s $C) ;;
      *)
        echo >&2 warning: unhandled mode $M in \"$A $B $C $D $M $P\"
        continue
        ;;
    esac
    total=$(( $total + $bytes ))
    printf '%d\t%s\n' $bytes "$P"
  done
  echo total $total
}

In use this looks like the following:

$ git file-size-diff HEAD~850..HEAD~845
-234   Documentation/RelNotes/1.7.7.txt
112    Documentation/git.txt
-4     GIT-VERSION-GEN
43     builtin/grep.c
42     diff-lib.c
594    git-rebase--interactive.sh
381    t/t3404-rebase-interactive.sh
114    t/test-lib.sh
743    tree-walk.c
28     tree-walk.h
67     unpack-trees.c
28     unpack-trees.h
total 1914

By using git-rev-parse it should accept all the usual ways of specifying commit ranges.

EDIT: updated to record the cumulative total. Note that bash runs the while read in a subshell, hence the additional curly braces to avoid losing the total when the subshell exits.

EDIT: added support for comparing the index against another tree-ish by using a --cached argument to call git diff-index instead of git diff-tree. eg:

$ git file-size-diff --cached master
-570    Makefile
-134    git-gui.sh
-1  lib/browser.tcl
931 lib/commit.tcl
18  lib/index.tcl
total 244

EDIT: Mark script as capable of running in a subdirectory of a git repository.

patthoyts
  • 32,320
  • 3
  • 62
  • 93
  • +1 Thanks! This would be *absolutely* perfect if it would print out the total size difference at the bottom. I want to see how many bytes were added/removed project-wide between two refs (not just per file, but in total, too). – Mathias Bynens Jun 01 '12 at 09:41
  • Another question: why are you sourcing `git-sh-setup` here? You don’t seem to be using [any of the functions it defines](http://schacon.github.com/git/git-sh-setup.html). Just wondering! – Mathias Bynens Jun 01 '12 at 09:44
  • 3
    It does basic checks like producing a sensible message if you run this command in a directory that is not a git repository. It also can help abstract out some platform differences. Mostly habit though. When writing a git script - first bring in the git-sh-setup file. – patthoyts Jun 01 '12 at 10:42
  • Thanks for the awesome script! I was looking for someway to monitor the increase of size after each commit and this helps a lot. I made a small gist to show only the total increase between all (some of) the commits in the repository https://gist.github.com/iamaziz/1019e5a9261132ac2a9a thanks again! – Aziz Alto Apr 22 '15 at 22:17
  • The use case I'm looking for is to preview large commits before I make them. Is there a way I can find the size changes of the currently staged changes? I've read through the tree-ish documentation, and I could not find a way to reference "current staged changes". – escapecharacter Oct 13 '16 at 22:51
  • Added support for comparing against the index using `git-diff-index`. – patthoyts Oct 13 '16 at 23:47
  • you can run `echo $PATH` to see your path directories to see where you can put this script file. I put mine in `/usr/local/git/bin` and it worked great. You can also add a path to your `$PATH` if you want to put the script somewhere else. – Josh Jan 09 '17 at 21:25
  • How do I use this? What is `HEAD~850`? Can I just use instead the commit id? – mr5 Aug 24 '17 at 09:44
  • 2
    @mr5 HEAD~850 is 850 commits before HEAD. It is just another notation for a commit and yes you can use a specific commit id or a tag or anything that can be resolved to a commit. The script uses `git rev-parse` so see the manual section "Specifying Revisions" in the git-rev-parse documentation for the full details. (https://git-scm.com/docs/git-rev-parse) – patthoyts Aug 24 '17 at 10:44
  • How would i be able to see what size the files had before? I am currently preparing a pull request that optimizes file output structure and would like to calculate a percentage of size decrease. – Philzen Apr 07 '20 at 15:06
  • That looks great! Any way to make it work on Windows? – FK- Oct 07 '22 at 09:45
  • This works on windows. Create the file in a directory that is on your PATH. – patthoyts Oct 11 '22 at 20:34
  • Great script! Very useful. I've added it to my [git-filesize-diff.sh](https://github.com/ElectricRCAircraftGuy/eRCaGuy_dotfiles/blob/master/useful_scripts/git-filesize-diff.sh) file in my [eRCaGuy_dotfiles](https://github.com/ElectricRCAircraftGuy/eRCaGuy_dotfiles) repo. You can see the output from my modified version of your script [in my commit message here](https://github.com/ElectricRCAircraftGuy/eRCaGuy_dotfiles/commit/05ce8b1476a1ebbe7f694b808da8cfe886abf64a). – Gabriel Staples Nov 21 '22 at 18:50
  • Nice script. However the `$(git cat-file -s $D) - $(git cat-file -s $C)` construct is problematic in the sense that it shows only the file size delta, not the file data delta. For example, you could have a file of 1024 bytes that has its content replace by different 1024 bytes, then the `$(git cat-file -s $D) - $(git cat-file -s $C)` construct would calculate a delta of 0 bytes, while the actual data delta is 1024 bytes. – Flow May 19 '23 at 13:15
28

You can pipe the out put of

git show some-ref:some-path-to-file | wc -c
git show some-other-ref:some-path-to-file | wc -c

and compare the 2 numbers.

Adam Dymitruk
  • 124,556
  • 26
  • 146
  • 141
  • 11
    +1 This is great for quickly checking the size difference of a file between versions. But how can this be used to get the total file difference between two commits? I want to see how many bytes were added/removed project-wide between two refs. – Mathias Bynens Jun 01 '12 at 09:39
  • 3
    You can skip the `| wc -c` if you use `cat-file -s` instead of `show` – neu242 Aug 24 '17 at 09:21
  • Using the improvement suggested by @neu242, I wrote this bash function: `gdbytes () { echo "$(git cat-file -s $1:$3) -> $(git cat-file -s $2:$3)" }` Which makes it easy to see how file size changed since last commit with e.g., `gdbytes @~ @ index.html` – webninja Dec 29 '17 at 04:05
  • if the `some-ref:` part is skipped, do you obtain the file size in the working directory? – 40detectives Jul 06 '18 at 08:14
4

Expanding on matthiaskrgr's answer, https://github.com/matthiaskrgr/gitdiffbinstat can be used like the other scripts:

gitdiffbinstat.sh HEAD..HEAD~4

Imo it really works well, much faster than anything else posted here. Sample output:

$ gitdiffbinstat.sh HEAD~6..HEAD~7
 HEAD~6..HEAD~7
 704a8b56161d8c69bfaf0c3e6be27a68f27453a6..40a8563d082143d81e622c675de1ea46db706f22
 Recursively getting stat for path "./c/data/gitrepo" from repo root......
 105 files changed in total
  3 text files changed, 16 insertions(+), 16 deletions(-) => [±0 lines]
  102 binary files changed 40374331 b (38 Mb) -> 39000258 b (37 Mb) => [-1374073 b (-1 Mb)]
   0 binary files added, 3 binary files removed, 99 binary files modified => [-3 files]
    0 b  added in new files, 777588 b (759 kb) removed => [-777588 b (-759 kb)]
    file modifications: 39596743 b (37 Mb) -> 39000258 b (37 Mb) => [-596485 b (-582 kb)]
    / ==>  [-1374073 b (-1 Mb)]

The output directory is funky with ./c/data... as /c is actually the filesytem root.

Community
  • 1
  • 1
guest
  • 61
  • 3
  • You didn't need to comment on Matthias' post - you could have suggested an edit to it instead, with these details that he didn't provide. By current standards, his answer would be considered a "link-only answer", and be deleted, so these sorts of details are important. – Mogsdad Apr 15 '16 at 02:36
  • who can take my answer and include it into matthias? – guest Apr 25 '16 at 11:26
  • If you want, you can make a suggested edit yourself. (In my experience, it would tend to get get rejected by reviewers, but a clear explanation in the Edit Summary could help.) But maybe I wasn't clear in my comment to you... your answer is a stand-alone answer, a good update of Matthias' older answer. You didn't need to include the text that explained that you meant to comment, is all. I edited the answer in a way that gives appropriate credit to Matthias. You don't need to do more. – Mogsdad Apr 25 '16 at 14:42
3

I made a bash script to compare branches/commits etc by actual file/content size. It can be found at https://github.com/matthiaskrgr/gitdiffbinstat and also detects file renames.

bobs
  • 21,844
  • 12
  • 67
  • 78
2

A comment to the script: git-file-size-diff, suggested by patthoyts. The script is very useful, however, I have found two issues:

  1. When someone change permissions on the file, git returns a another type in the case statement:

    T) echo >&2 "Skipping change of type"
    continue ;;
    
  2. If a sha-1 value doesn't exist anymore (for some reason), the script crashes. You need to validate the sha before getting the file size:

    $(git cat-file -e $D) if [ "$?" = 1 ]; then continue; fi

The complete case statement will then look like this:

case $M in
      M) $(git cat-file -e $D)
         if [ "$?" = 1 ]; then continue; fi
         $(git cat-file -e $C)
         if [ "$?" = 1 ]; then continue; fi
         bytes=$(( $(git cat-file -s $D) - $(git cat-file -s $C) )) ;;
      A) $(git cat-file -e $D)
         if [ "$?" = 1 ]; then continue; fi
         bytes=$(git cat-file -s $D) ;;
      D) $(git cat-file -e $C)
         if [ "$?" = 1 ]; then continue; fi
         bytes=-$(git cat-file -s $C) ;;
      T) echo >&2 "Skipping change of type"
         continue ;;
      *)
        echo >&2 warning: unhandled mode $M in \"$A $B $C $D $M $P\"
        continue
        ;;
    esac
Abdul
  • 2,002
  • 7
  • 31
  • 65
1

The Git core commands can make this much more efficient, instead of the postprocessing being three commands per blob it's three commands total:

filesizediffs() {
    git diff-tree "$@" \
    | awk '$1":"$2 ~ /:[10]0....:[10]0/ {
            print $3?$3:empty,substr($5,3)
            print $4?$4:empty,substr($5,3)
      }'  FS='[  ]' empty=`git hash-object -w --stdin <&-` \
    | git cat-file --batch-check=$'%(objectsize)\t%(rest)' \
    |  awk '!seen[$2]++ { first[$2]=$1 }
            $1!=first[$2] { print $1-first[$2],$2; total+=$1-first[$2] }
            END { print "total size difference "total }' FS=$'\t' OFS=$'\t'
}
filesizediffs @

on GNU/anything.

jthill
  • 55,082
  • 5
  • 77
  • 137
0

If you’re happy with an approximate answer, you can get a back-of-napkin size of the data in a commit with:

git archive <COMMIT> | wc -c

The reported size will be the number of bytes of all the data in the commit plus some tar metadata. Since tar by itself (the default for git archive) doesn’t do compression the reported numbers are somewhat comparable.

If your intent is to find the one commit that added the 1GB log file, this approach is perfectly sufficient.

Boldewyn
  • 81,211
  • 44
  • 156
  • 212