Fundamentally, each commit is (or "has") an stored tree that is independent of every other commit, so to get "files added by a commit" you must compare (i.e., diff) that commit against some other commit.
For many/most commits it's easy to choose the other commit: use the commit's (single) parent commit. For merge commits (those with two or more parents) the answer is less obvious and I don't know what you will want to do for these.
For a root commit (a commit with no parent), you can still get the number of files added with respect to an empty tree, by diffing against git's "well known, if poorly advertised, empty tree". Or, you might choose to ignore root commits entirely (which simplifies your task).
There's no single git command that will do everything for you here, but it's easy to put together a script or pipeline that will do the trick. The main thing to know is that you will use git rev-list
to generate all the candidate commit IDs:
git rev-list --min-parents=1 --max-parents=1 HEAD
will, for instance, get you a list of every commit reachable from HEAD
that has exactly 1 parent (i.e., is neither a merge commit nor a root commit). It's up to you to decide whether this is the set of commits you'd like to inspect.
If it is, we're now in pretty good shape since we can simply git diff
each such commit against its (single) parent:
git rev-list --min-parents=1 --max-parents=1 HEAD | \
while read sha1; do \
...
done
Now the trick is to get git diff
to give us the number of files added, perhaps with a bit of help from another command. This is pretty easy because git diff
has --name-status
and --name-only
options, and also a --diff-filter
option. Using --name-status
will get you output like this:
$ git diff --name-status 0df0541bf13723658d31b8d1376b505b710e63c6^ \
0df0541bf13723658d31b8d1376b505b710e63c6
A Documentation/RelNotes/2.4.5.txt
M Documentation/git.txt
M GIT-VERSION-GEN
M RelNotes
Adding --diff-filter=A
eliminates all but the A
dded files, after which we don't really need --name-status
(not that it hurts either) since just the name alone, --name-only
, will tell us which files were added when comparing these two commits:
$ git diff --name-only --diff-filter=A \
0df0541bf13723658d31b8d1376b505b710e63c6^ \
0df0541bf13723658d31b8d1376b505b710e63c6
Documentation/RelNotes/2.4.5.txt
Running this output through wc -l
gets a count of lines, which is also a count of files, since each file name is on its own line.1
So, now we have a script that looks like this (I'll leave the backslashes out now):
git rev-list --min-parents=1 --max-parents=1 HEAD |
while read sha1; do
echo $(git diff --name-only --diff-filter=A ${sha1}^ ${sha1} | wc -l) $sha1
done
The output of this script can then be passed to sort -rn
, for instance.
You may wish to tweak these somewhat, depending on what you need to do with merges. You might also want to defeat rename-detection on the git diff
commands (or maybe not, it really does depend on how you're using this).
1Ignoring the possibility of having a newline embedded in a file name, anyway. If you want a really general purpose tool you should consider this possibility, but you can probably ignore it for your case.