How can I search my git logs to see which files have had the most activity?
-
Related: http://stackoverflow.com/questions/1265040/how-to-count-total-lines-changed-by-a-specific-author-in-a-git-repository – Josh Lee Apr 14 '11 at 21:03
-
You can use `git diff --stat revA revB` to get the sum of all additions removals (but it won't tell you the absolute number of commits that actually touched the file). – Jason LeBrun Apr 14 '11 at 21:05
-
That link is for a particular author however the one command `git log --numstat` seems to be in the right direction, but it just spits out the stats for every file in no particular order, but we have like thousands of files. – JP Silvashy Apr 14 '11 at 21:06
-
@jason, thanks, the problem is that we need to look over *all* the commits ever made and see which files either have had the most commits or the most additions/removals total. – JP Silvashy Apr 14 '11 at 21:07
-
1Possible duplicate of [Finding most changed files in Git](http://stackoverflow.com/questions/7686582/finding-most-changed-files-in-git) – fracz Jan 27 '16 at 15:11
5 Answers
that's one of these things that is very easy, accidentally (?):
git rev-list --objects --all | awk '$2' | sort -k2 | uniq -cf1 | sort -rn | head
- give me all objects from all revisions in all branches
- ignore any results without a path
- sort them by path
- make them unique (ignoring the blob hash), prefix lines with duplication count
- sort descending on duplication count
- show topmost lines
Output similar to
1058 fffcba193374a85fd6a3490f800c6901218a950b src
715 ffffe0f08798e95b66cc4ad4ff22cf10734d045e src/lib
450 ffcfe596031a5985664e35937fff4ac9ff38dcca src/zfs-fuse
367 ffc5d5340f95360fc9f7b739c5593dd3f92fced0 src/lib/libzpool
202 ff92db000792044d45eec21c57a3cd21618631e7 src/lib/libsolkerncompat
183 ff1a44edae3fd121ddd86864b589e5ab2f9ff99b src/lib/libzfscommon
178 fec6b3a789e578983c2242b3aa5adf217cb8b887 src/lib/libzfs
168 ffeefc9e81222d7c471bdb0911d8b98f23cff050 src/cmd
167 fbd60bd3430765863648c52db7ceb3ffa15d5e50 src/lib/libzfscommon/include
155 ff225f6b41f9557d683079c5f9276f497bcb06bd src/lib/libzfscommon/include/sys
You can take it from here.
E.g. if you wanted to see only file blobs:
git rev-list --objects --all | awk '$2' | sort -k2 | uniq -cf1 | sort -rn |
while read frequency sample file
do
[ "blob" == "$(git cat-file -t $sample)" ] && echo -e "$frequency\t$file";
done
output:
135 src/zfs-fuse/zfs_operations.c
84 src/zfs-fuse/zfs_ioctl.c
79 src/zfs-fuse/zfs_vnops.c
73 src/lib/libzfs/libzfs_dataset.c
67 src/lib/libzpool/spa.c
66 src/zfs-fuse/zfs_vfsops.c
62 src/cmd/zdb/zdb.c
62 CHANGES
60 src/cmd/ztest/ztest.c
60 src/lib/libzpool/arc.c
You wanted to see only specifc range of revisions
You can have a ball with the rev-list
part:
git rev-list --after=2011-01-01 --until='two weeks ago' \
tag1...remote/hotfix ^master
Will use only revisions in the specified date range, that are in the symmetric set difference for tag1
and remote/hotfix
and are not in master

- 5,005
- 1
- 38
- 59

- 374,641
- 47
- 450
- 633
-
Cheers. I had fun writing that down :) Laaaaarge kudos to the gentlemen who designed git in the UNIX filosophy – sehe Apr 14 '11 at 22:15
-
A great answer, thanks! I'll leave an edit to make it compatible with ZSH, in which using `path` as a variable can lead to troubles – Piotr Zierhoffer Feb 14 '19 at 11:20
uses git effort [--above <value>]
(from git-extras package) to list all files and the number of commit concerned.
You can restrict to a path

- 6,732
- 3
- 36
- 49
I needed something similar recently in a project whose source code was entirely composed of java files. Similar to sehe's answer which I used as the base for this and expanded upon as I wanted to do it in one line without loops. My question was what are the top 5 files that have changed the most?
git rev-list --objects --all | awk '$2 ~ /\.java/' | awk '{print $2}' | sort -k2 | uniq -c | sort -rn | head -n 5
To break it down:
- git rev-list --objects --all: give me all objects from all branches
- awk '$2 ~ /.java/': filter out lines where the second argument ($2) does not contain the phrase .java (~ /.java/) with regex
- awk '{print $2}': Print the second argument
- sort: Sort by path
- uniq -c: Make them unique and count number of times each file appears
- sort -r: Sort in reverse order
- head -n 5: limit result to top 5
Output is
130 richtextfx/src/main/java/org/fxmisc/richtext/GenericStyledArea.java
126 richtextfx/src/main/java/org/fxmisc/richtext/StyledTextArea.java
58 richtextfx/src/main/java/org/fxmisc/richtext/ParagraphText.java
47 richtextfx/src/main/java/org/fxmisc/richtext/EditableStyledDocument.java
43 richtextfx/src/main/java/org/fxmisc/richtext/skin/StyledTextAreaVisual.java

- 313
- 1
- 5
Here's a python script that you can pipe the log --numstat output through to get the results:
import sys, re
res = {}
while 1:
line = sys.stdin.readline()
if len(line) == 0:
break;
m = re.match("([0-9]+)[ \t]+([0-9]+)[ \t]+(.*)", line)
if m != None:
f = m.group(3)
if f not in res: res[f] = {'add':0, 'rem':0, 'commits':0}
res[f]['commits'] += 1
res[f]['add'] += int(m.group(1))
res[f]['rem'] += int(m.group(2))
for f in res:
r = res[f]
print "%s %s %s %s"%(r['commits'], r['add'], r['rem'], f)
You can modify it as needed to sort/filter how you want.

- 13,037
- 3
- 46
- 42
Assuming the range of revisions you want to select is <range>
, the command:
git log --format=%n --name-only <range>|sort|uniq -c|tail -n +2
will output for each file of your repository the number of occurences in commit diffs, ie number of changes, including file creation as a change. Keep <range>
empty to get statistics from initial commit to your branch HEAD
.

- 369
- 2
- 10