git find fat commit

Question

Is it possible to get info about how much space is wasted by changes in every commit — so I can find commits which added big files or a lot of files. This is all to try to reduce git repo size (rebasing and maybe filtering commits)

Consider simply running `git gc` occasionally, possibly as `git gc --aggressive` — Hasturkun, Aug 17 '09 at 07:33
`git gc` (and `git gc --prune`); `--aggresive` can even give worse results (but usually shouldn't), and is usually not worth it. — Jakub Narębski, Aug 17 '09 at 19:55
This answer is much better: http://stackoverflow.com/a/10847242/520567 — akostadinov, Jun 09 '14 at 13:32

score 27 · Answer 1 · edited May 12 '14 at 10:34

27

You could do this:

git ls-tree -r -t -l --full-name HEAD | sort -n -k 4

This will show the largest files at the bottom (fourth column is the file (blob) size.

If you need to look at different branches you'll want to change HEAD to those branch names. Or, put this in a loop over the branches, tags, or revs you are interested in.

edited May 12 '14 at 10:34

om-nom-nom

62,329
13
183
228

answered Aug 17 '09 at 19:51

Pat Notz

208,672
30
90
92

tig · Accepted Answer · 2010-08-26T09:05:14.697

18

Forgot to reply, my answer is:

git rev-list --all --pretty=format:'%H%n%an%n%s'    # get all commits
git diff-tree -r -c -M -C --no-commit-id #{sha}     # get new blobs for each commit
git cat-file --batch-check << blob ids              # get size of each blob

edited Aug 26 '10 at 09:05

answered Oct 12 '09 at 22:59

tig

25,841
10
64
96

1

@sschuberth: If I read your script correctly it only takes into account files that were _added_ in a particular commit. It won't detect when a file grew substantially in a commit. – kynan Apr 19 '12 at 00:07
@kynan: You're right, as that's what the OP requested (and what I needed). But it's easy to change the script to detect modified files: Basically you just need to replace "A" by "M" in the grep call. That will report the total file size after the modification (not the number of bytes added / removed). I'd happily accept a pull request on GitHub to make the script more generic. – sschuberth Apr 24 '12 at 14:04
7

Broken link, the script is now located [here](https://github.com/sschuberth/dev-scripts/blob/master/git/git-commit-size.sh) – Luke Dec 15 '12 at 01:55
1

`--diff-filter` might be used instead of the unreliable `grep` but anyways, this answer is much better IMO: http://stackoverflow.com/a/10847242/520567 – akostadinov Jun 09 '14 at 13:33

score 9 · Answer 3 · edited Sep 06 '17 at 23:27

All of the solutions provided here focus on file sizes but the original question asked was about commit sizes, which in my opinion, and in my case in point, was more important to find (because what I wanted is to get rid of many small binaries introduced in a single commit, which summed up accounted for a lot of size, but small size if measured individually by file).

A solution that focuses on commit sizes is the provided here, which is this perl script:

#!/usr/bin/perl
foreach my $rev (`git rev-list --all --pretty=oneline`) {
  my $tot = 0;
  ($sha = $rev) =~ s/\s.*$//;
  foreach my $blob (`git diff-tree -r -c -M -C --no-commit-id $sha`) {
    $blob = (split /\s/, $blob)[3];
    next if $blob == "0000000000000000000000000000000000000000"; # Deleted
    my $size = `echo $blob | git cat-file --batch-check`;
    $size = (split /\s/, $size)[2];
    $tot += int($size);
  }
  my $revn = substr($rev, 0, 40);
#  if ($tot > 1000000) {
    print "$tot $revn " . `git show --pretty="format:" --name-only $revn | wc -l`  ;
#  }
}

And which I call like this:

./git-commit-sizes.pl | sort -n -k 1

score 2 · Answer 4 · answered Jun 01 '14 at 22:55

2

#!/bin/bash
COMMITSHA=$1

CURRENTSIZE=$(git ls-tree -lrt $COMMITSHA | grep blob | sed -E "s/.{53} *([0-9]*).*/\1/g" | paste -sd+ - | bc)
PREVSIZE=$(git ls-tree -lrt $COMMITSHA^ | grep blob | sed -E "s/.{53} *([0-9]*).*/\1/g" | paste -sd+ - | bc)
echo "$CURRENTSIZE - $PREVSIZE" | bc

answered Jun 01 '14 at 22:55

Stas Dashkovsky

111
9

And also I suggest to use git format-patch to get commit size (there will be some additional size for mail header, but actually if you need to fast commit is not too huge - it's not so important to get exact size, +- 1K will be good accuracy) – Stas Dashkovsky Jun 19 '14 at 16:09

score 2 · Answer 5 · edited Apr 12 '17 at 15:14

2

git fat find N where N is in bytes will return all the files in the whole history which are larger than N bytes.

You can find out more about git-fat here: https://github.com/cyaninc/git-fat

edited Apr 12 '17 at 15:14

Gianfranco P.

10,049
6
51
68

answered Sep 11 '14 at 19:54

Caustic

475
3
14

Bummer. I tried it on Git Shell for Windows that comes with GitHub Desktop and the command didn't work, giving me an error. – DucRP Jan 09 '17 at 15:06
@DucRP I think you have to install git fat on you computer – mvoelcker Jan 27 '22 at 19:39

score 2 · Answer 6 · edited May 23 '17 at 11:47

2

Personally, I found this answer to be most helpful when trying to find large files in the history of a git repo: Find files in git repo over x megabytes, that don't exist in HEAD

edited May 23 '17 at 11:47

Community

1
1

answered Nov 30 '11 at 23:59

Michael Baltaks

2,141
19
22

score 1 · Answer 7 · edited Nov 14 '12 at 08:42

1

git cat-file -s <object> where <object> can refer to a commit, blob, tree, or tag.

edited Nov 14 '12 at 08:42

Dominik Honnef

17,937
7
41
43

answered Aug 17 '09 at 12:12

artagnon

3,609
3
23
26

git find fat commit

7 Answers7

Linked