50

How can I get commit size shown in the output of git log?

You may understand commit size as the diff between its parents and itself, or anything reasonable that tells you how big the commit is.

git log has a --log-size option but its the size of the log message, not the commit itself.

Cyker
  • 9,946
  • 8
  • 65
  • 93
  • Could you give an example of a commit and what it's "size" would be? Also, why do you need this? – Schwern Nov 19 '16 at 22:15
  • @Schwern I think you have shown in your answer below what a commit is. I need commit size to quickly identify how much work is done in each commit from a long list of commits. This is helpful in identifying major changes. I don't put a restriction on its definition as long as that definition is reasonable. – Cyker Nov 19 '16 at 22:48
  • What do you mean by "how much work is done in each commit"? What are you using that information for? I smell a misuse of code metrics. – Schwern Nov 19 '16 at 22:52
  • Also be wary that the number of lines changed in a commit can be deceptive. For example, if I reindent the code, that will show up as changing a lot of lines, but it's very little "work". Things like `-b` and `-w` can be used to ignore whitespace changes, but some automatic code stylers go beyond simple whitespace changes. – Schwern Nov 19 '16 at 22:54
  • @Schwern OK. Let's simplify this to the utmost level. I want a number which is the sum of lines added and lines deleted in each commit and append this number after the title of each commit in the log message printed in oneline format. Is this clear? – Cyker Nov 19 '16 at 22:56
  • Why do you want this? Again, I smell a misuse of metrics. – Schwern Nov 19 '16 at 22:58
  • @Schwern I understand lines of editing may not accurately reflect the amount of work done in each commit. But it's better than having nothing at hand. At least those tiny commits with very little change can be easily filtered out. – Cyker Nov 19 '16 at 22:59
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/128550/discussion-between-schwern-and-cyker). – Schwern Nov 19 '16 at 22:59
  • @Schwern (commit size) = (bytes added) - (bytes removed) instead of lines added/removed seems easy enough… – Geremia Nov 21 '18 at 04:44
  • One case that I occasionally run into is I do a push and it takes several minutes for what I think is very small changes. If the change was actually small, I know I should look into a slow network or a slow server. If it took that long because I was actually pushing 100MB of data, then I want to figure out why I am pushing so much data. – Troy Daniels Dec 28 '21 at 17:51

2 Answers2

36

The "size" of a commit can mean different things. If you mean how much disk storage it takes up... that's very tricky to tell in Git and probably unproductive. Whereas something like SVN stores commits as deltas, when you change a file in Git it stores a new copy of the file as an object in a graph database. One object can be shared in many commits. While this might sound inefficient, Git has many clever ways to use disk space shockingly efficiently.

If you mean how many lines did it change, that's easy. You can use various flags to get how many files and lines changed, most of them have the word "stat" in them. For example, git log --shortstat will tell you how many files changed, and how many lines were inserted and deleted. Here's an example.

commit e3d1909c875ea0c1a64246d735affa039ad11aa0 (origin/master, origin/HEAD)
Author: Michael G. Schwern <schwern@pobox.com>
Date:   Thu Aug 11 13:04:24 2016 -0700

    Add default Travis and AppVeyor configs.

    The AppVeyor one is set up for Dist::Zilla, the hardest of the bunch.

 2 files changed, 60 insertions(+)

If you want an idea of the disk storage that commit represents, you need to get the IDs of the new files (blob objects) the commit created, then check their size. You can see them in a git log -p.

commit 0f28d9a96bc92d802b57900ce4a06db71cbaef6d
Author: Michael G. Schwern <schwern@pobox.com>
Date:   Wed Aug 10 09:13:40 2016 -0700

    Remove my name from the gitconfig.

    Now it can be used by anyone. Git will prompt for the user info.

diff --git a/.gitconfig b/.gitconfig
index 1d539bd..538440f 100644
--- a/.gitconfig
+++ b/.gitconfig
@@ -1,18 +1,10 @@
-# If you use this file, remember to change the [user] and [sendemail] sections.
-
...and so on...

index 1d539bd..538440f 100644 indicates this replaced blob object (file) 1d539bd with 538440f and uses permissions 0644. If you run git cat-file -s 538440f it tells me the object is 4356 bytes. That's it's uncompressed size. On disk it's just 1849 bytes.

$ ls -l .git/objects/53/8440f84014584432fa5bf09d761926b3d70dbe 
-r--r--r-- 1 schwern staff 1849 Aug 10 09:14 .git/objects/53/8440f84014584432fa5bf09d761926b3d70dbe

After I git gc even the object file is gone. Now everything is in a pack file using less than 10K.

$ tree -h .git/objects/
.git/objects/
├── [ 102]  info
│   └── [  54]  packs
└── [ 136]  pack
    ├── [1.9K]  pack-d5b7110001ed35cce1aa0a380db762f39505b1c0.idx
    └── [7.8K]  pack-d5b7110001ed35cce1aa0a380db762f39505b1c0.pack

This answer shows how to get the blobs from a commit in a more automated fashion.

Community
  • 1
  • 1
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • 5
    I think stats are a good way to identify commits with major changes. But they just look tedious in the output. Any chance we can do some computation with the stats and pretty print the result in an oneline format? For example, print each commit with its checksum, title and a number (#lines_added + #lines_deleted)? Didn't find such a placeholder in git log format placeholders. Am I missing something? – Cyker Nov 19 '16 at 22:53
  • 4
    @Cyker Something like `git log --pretty=format:"%h %s" --shortstat` – Schwern Apr 14 '18 at 19:18
20

Here's a really simplistic/brute force way to do this:

git format-patch --stdout ref1..ref2 | wc -c

  • The format-patch gives you the difference between ref1 and ref2
  • For ref1/ref2, you can use any valid git reference name, for example a0b1c2d3, HEAD, HEAD~1, etc.
  • The wc -c gives you the size of the patch in bytes

Note: the git commit metadata will also count towards the byte size. This may be a feature or a bug of this method, depending on what information exactly that you want. More git format-patch and/or unix pipeline/grep might give you more control here.

EdwardTeach
  • 615
  • 6
  • 18
  • This is a really nice, simple, and well explained technique to get an idea of how big your commit is in bytes. Several years after the accepted answer, and we still share better ways of doing things. Kudos @EdwardTeach! – Gio Jan 05 '22 at 00:40
  • 1
    Note that what is stored in Git is not the delta but the new object. This method shows the approximate size of the delta. So it's interesting but hard to draw conclusions from the number this produces. – Eric Walker Jul 16 '22 at 04:42