TL;DR
Using git blame, you can conclude that Bellard is the person who last touched 8851 of the 1942819 lines in the code base, or 0.46% of them.
Details
With some 8000 files in the repo containing a total of nearly 2 million lines, running git blame
on each file will take a long time, but it would let you see how many lines were still in the repo that Bellard/Lantau had contributed. As @Gyan says, though, this will only report lines that are exactly as he wrote them, any change in whitespace or style will be attributed to the person who made those trivial changes.
That being said, here's the loop:
git clone https://github.com/FFmpeg/FFmpeg
cd FFmpeg
for f in $(git ls-tree HEAD -r --name-only) ; do git blame $f ; done > blame
That loop will take a long time to run (it took about 5 hours on my computer), but eventually you'll be able to extract the author from each line with something like this:
cat blame | sed -e 's/ *20[012][0-9].*//' -e 's/^[^(]*(//' > blame-author
that's based on parsing lines from the blame output that look like this:
f1ab71b0463 (Timo Rothenpieler 2017-05-11 22:53:41 +0200 26) *.ptx.c
6bcd3e05998 (Federico Tomassetti 2015-08-13 20:13:48 +0200 11) compiler:
5d3049559af COPYING.GPL (Diego Biurrun 2007-07-12 20:27:07 +0000 187) the Program or works based on it.
my crude parser is not perfect, but it's enough to get statistics out of a crude tool like blame.
Let's count lines by authors, now:
cat blame-author | sort | uniq -c | sort -nr | less -N
shows the list of contributors to the code base, ranked from high to low by the number of lines last touched by that contributor according to the commit logs. Here's the top 50 list:
1 209136 Paul B Mahol
2 121248 Michael Niedermayer
3 114289 Anton Khirnov
4 109653 Andreas Rheinhardt
5 75457 Diego Biurrun
6 54739 Ronald S. Bultje
7 48739 James Almer
8 48571 Kostya Shishkov
9 48096 Shivraj Patil
10 44086 Martin Storsjö
11 41019 Mark Thompson
12 40305 Clément Bœsch
13 37204 Stefano Sabatini
14 34637 Vittorio Giovara
15 26003 Luca Barbato
16 21898 Justin Ruggles
17 20845 Mans Rullgard
18 20403 Lynne
19 20172 Nicolas George
20 19849 Vitor Sessak
21 18044 Kaustubh Raste
22 17297 Aurelien Jacobs
23 16258 Måns Rullgård
24 15242 Hao Chen
25 14281 Peter Ross
26 13971 Mike Melanson
27 13943 Marton Balint
28 11798 Guillaume Martres
29 11284 Rostislav Pehlivanov
30 11013 Shiyou Yin
31 10836 foo86
32 9895 Baptiste Coudurier
33 9375 Derek Buitenhuis
34 9367 Janne Grunau
35 9214 Matthieu Bouron
36 9160 Carl Eugen Hoyos
37 9065 wm4
38 8851 Fabrice Bellard
39 8813 Zhou Xiaoyong
40 8625 Timo Rothenpieler
41 8410 Reimar Döffinger
42 8361 Steven Liu
43 7409 Timothy Gu
44 7147 Thilo Borgmann
45 6886 Lukasz Marek
46 6667 Martin Vignali
47 6445 Ben Avison
48 6274 Limin Wang
49 6213 rcombs
50 6138 Daniel Kang
In this list, you can find Bellard in position 38, with 8851 lines, or 0.46% of the 1942819 lines wc -l blame-author
says were analyzed.
Methodological limitations
I should have removed tests/ref
and tests/reference.pnm
from my processing, since those are a lot of binary files, but without them there are still 1.8M lines, so the answer remain around .4 to .5%.
Even better, I should have identified and filtered out all binary files. My blame-author
file has some binary lines due to them. Again, I believe it's a minor error, but it's there nonetheless.
The four COPYING.*GPL*
files are included, but were obviously not written by whoever committed them. That's only 1680 lines, but credit is given to committing something, not actually writing it. git blame
is a crude tool.
492 of those lines are attributed to Bellard himself, so leaving them out would reduce the estimate of his surviving contribution to about 0.42% of the code base.
git blame
can accept a --ignore-revs-file FILENAME
option that lists commits that only apply style changes. E.g., I use that in my repos to exclude the commits where I am just reformatting Python code with black, or you could use it to ignore commits that only change CRLF to LF line endings in some files. I did not try to find style-only commits in FFmpeg but one could improve the significance of these statistics by doing so.
I didn't see the name Lantau anywhere, so I assume all of Bellard's contributions are under that name.
For future reference, should anyone actually care, my analysis is based on this commit, which is the HEAD of the master branch at the moment of writing:
commit 8ad988ac37d4d92dbb60796e26c3ad558a3eebeb (HEAD -> master, origin/master, origin/HEAD)
Author: Saliev, Rafik F <rafik.f.saliev-at-intel.com@ffmpeg.org>
Date: Fri Dec 16 09:37:27 2022 +0000