83

I'm working with a team on a git project and I would like to see the contribution of each author in terms of lines written or lines edited etc... how can I show statistics for the authors?

mikemaccana
  • 110,530
  • 99
  • 389
  • 494
Anonymous
  • 4,470
  • 3
  • 36
  • 67
  • That statistics has non sense in anyway, since you will get "contribution" to designer files that use to be auto-generated and longer (in lines of code) than most real changes. Also you can not take the number of lines as valid statistic of "contribution" since more lines of code usually are not the same as more work. I hope you are not going to measure your employees workload based on that random number! – Windgate Jul 24 '23 at 10:39

5 Answers5

112

As simple as: git shortlog -s -n

Kovalex
  • 1,720
  • 2
  • 11
  • 8
32

You could try git-stats or use git command to explore from the logs

Refer following posts

  1. Graphical Stats - Generating statistics from Git repository
  2. https://gist.github.com/eyecatchup/3fb7ef0c0cbdb72412fc
  3. Which Git commit stats are easy to pull
  4. PR-Count Github App - Github ONLY. Thanks @ben
Libin Varghese
  • 1,506
  • 13
  • 19
  • 1
    You can also try github app: PR Count https://github.com/marketplace/pr-count, if you don't like command line tool. – Ben Mar 28 '20 at 19:43
  • pr-count seems not to exist anymore. Also from GitHub, it sends to a bogus link – marco Jul 09 '22 at 18:11
16

For those who prefer a more built-in solution, this script simply makes use of git, grep, & awk:

Example

$ git user-stats
Email                           Commits         Files           Insertions      Deletions       Total Lines
-----                           -------         -----           ----------      ---------       -----------
john.smith@gmail.com            289             35              5361            3293            8654
joe.dirt@yahoo.com              142             17              2631            1756            4387
jack.bauer@fbi.gov              115             9               1407            1107            2514
$ git -C path/to/repo user-stats --since="1 week ago"
Email                           Commits         Files           Insertions      Deletions       Total Lines
-----                           -------         -----           ----------      ---------       -----------
joe.dirt@yahoo.com              20              3               83              634             717
john.smith@gmail.com            21              2               242             110             352

Usage

$ git [git options] user-stats [git-log options]

Show me the code

#!/bin/bash
#
# Show user stats (commits, files modified, insertions, deletions, and total
# lines modified) for a repo

git_log_opts=( "$@" )

git log "${git_log_opts[@]}" --format='author: %ae' --numstat \
    | tr '[A-Z]' '[a-z]' \
    | grep -v '^$' \
    | grep -v '^-' \
    | awk '
        {
            if ($1 == "author:") {
                author = $2;
                commits[author]++;
            } else {
                insertions[author] += $1;
                deletions[author] += $2;
                total[author] += $1 + $2;
                # if this is the first time seeing this file for this
                # author, increment their file count
                author_file = author ":" $3;
                if (!(author_file in seen)) {
                    seen[author_file] = 1;
                    files[author]++;
                }
            }
        }
        END {
            # Print a header
            printf("%-30s\t%-10s\t%-10s\t%-10s\t%-10s\t%-10s\n",
                   "Email", "Commits", "Files",
                   "Insertions", "Deletions", "Total Lines");
            printf("%-30s\t%-10s\t%-10s\t%-10s\t%-10s\t%-10s\n",
                   "-----", "-------", "-----",
                   "----------", "---------", "-----------");
            
            # Print the stats for each user, sorted by total lines
            n = asorti(total, sorted_emails, "@val_num_desc");
            for (i = 1; i <= n; i++) {
                email = sorted_emails[i];
                printf("%-30s\t%-10s\t%-10s\t%-10s\t%-10s\t%-10s\n",
                       email, commits[email], files[email],
                       insertions[email], deletions[email], total[email]);
            }
        }
'

View gist

Installation

Download the script, give it executable permissions, and stick it somewhere in your path. e.g.:

wget -O ~/bin/git-user-stats https://gist.githubusercontent.com/shitchell/783cc8a892ed1591eca2afeb65e8720a/raw/git-user-stats
chmod +x ~/bin/git-user-stats
cd ~/path/to/repo
git user-stats --since="1 week ago"

Explanation

Basically it uses git log --format="author: %ae" --numstat (minus any empty lines or binary files) to generate output that looks like:

author: bob.smith@gmail.com
1       147     foo/bar.py
0       370     hello/world.py
author: john.smith@aol.com
7       6       foo/bar.py
author: jack.bauer@fbi.gov
1       0       super/sekrit.txt
author: john.smith@aol.com
2       1       hello/world.py

Each section that starts with author: ... is a single commit. The first column of --numstat is the number of insertions, and the second column is the number of deletions for that file.

It then walks over each line with awk. Whenever it hits a line that starts with author:, it stores the 2nd column of that line (the email address of the author for that particular commit) in the variable author and increments that user's total number of commits. For each subsequent line, it updates the number of insertions, deletions, and files for that user until it hits the next line that starts with author:. Rinse and repeat until it's done.

At the end, it sorts by the total line changes (insertions + deletions) and prints out all of the collected stats. If you wanted to sort by something else, you would simply replace the total array with the relevant array in the asorti(...) function. e.g., to sort by number of files, you would change that line to:

n = asorti(files, sorted_emails, "@val_num_desc");

note any arguments/options will be passed to the git log command and can be used to filter results
git user-stats --since="2 weeks ago"

A little more detail

The git log output is run through:

  • tr '[A-Z]' '[a-z]' to normalize email addresses. My company capitalizes email addresses a la John.Smith@TheCompany.com, and depending on where / how a user is making their commit, that email might show up capitalized or all lowercase. This ensures that all instances of a particular email address are always grouped together regardless of capitalization.
  • grep -v '^$' to remove empty lines that show up by default in the log output
  • grep -v '^-' to remove the --numstat info for binary files, which looks like:
    - - foo/bar.png

Also, a cool feature of git that took me forever to find out is that, if you place an executable named git-some-command in a folder in your PATH, git will detect it, and you can use it via git some-command! This has the added benefit of being able to specify custom configuration settings on a per-command basis, e.g. git -c color.ui=always some-command | sed .... So if you drop this script in, say, ~/bin/git-user-stats, that's how you can use it via git user-stats as in the examples

Shaun Mitchell
  • 2,306
  • 2
  • 15
  • 10
  • 2
    If you see this error `awk: line 38: function asorti never defined` try installing gawk. On Ubuntu: `sudo apt-get install gawk`. By default Ubuntu uses mawk for awk, not gawk. `asorti` is a gawk extension. – Andrew H Jan 04 '23 at 14:52
  • Is there a way to ignore line ending changes and white space? – kolobcreek Jan 09 '23 at 22:20
  • @AndrewH Oof, didn't realize I had `gawk` installed as a dependency. Thanks for the info, Andrew! I'll have to look into whether I can sort without it, else update the answer to reflect the dependency. – Shaun Mitchell Feb 01 '23 at 18:05
  • @kolobcreek How do you mean? As in, don't count files where the only modification was a whitespace change? – Shaun Mitchell Feb 01 '23 at 18:06
  • @kolobcreek Assuming you mean that you want to ignore situations where the only change a person made was whitespace, you can run it with the git log/diff option `--ignore-all-space`. Altogether that would be `git user-stats --ignore-all-space`. Other options to ignore various types of space changes are `--ignore-cr-at-eol`, `--ignore-space-at-eol`, `--ignore-space-change`, `--ignore-blank-lines` You can mix and match those as you please. `git user-stats` will pass any arguments/options to the internal `git log` command, so anything `git log` accepts will work with this! – Shaun Mitchell Feb 06 '23 at 17:06
  • One small typo. In the last line of the installation code you have `git user-stats` instead of `git-user-stats` – jocassid Aug 08 '23 at 15:52
8

You should have a look to repoXplorer, an open source project I develop. It is able to compute stats for a project (a group of git repositories) as well as for a contributor and a group of contributors. It provides a REST interface and a web UI. The web UI provides various information like, for a given contributor:

  • commits, lines changed and projects count
  • date histogram of commits
  • top projects by commits
  • top projects by lines changed

But best is to have a look to the demo instance here.

Here is a screenshot of the stats page of a contributor (stats are computed accross all repositories indexed by repoxplorer, but can be filtered for a specific project):

enter image description here

Fbo
  • 16,527
  • 1
  • 20
  • 11
  • Can we use this for non-github projects? – Tim Sep 26 '19 at 14:34
  • Sure, it can be used with any git repositories. It is not tied to GitHub or other hosting platforms. – Fbo Sep 27 '19 at 21:41
  • are there any instructions for that? I saw only github specific ones at least under "quickstart" – Tim Sep 28 '19 at 11:10
  • 1
    If you use the docker way, then follow the first step of the section https://github.com/morucci/repoxplorer#quickstart---use-the-repoxplorer-docker-container-to-index-a-github-organization in order to start repoXplorer. Then create a "projects.yaml" inside docker-data/conf/ by following https://github.com/morucci/repoxplorer#define-projects-to-index. The github helper is just a tool to read a github organisation from the github API to create a "projects.yaml". In your case you have to create the file manually. – Fbo Sep 29 '19 at 20:51
  • Couldn't see how to configure it for anything but github – Jonathan Ruiz Apr 02 '20 at 15:35
2

I'd suggest Gitential. It measures:

  • coding volume
  • coding hours
  • productivity
  • efficiency

and provides an analytical interface to visualize them on multiple levels:

  • projects
  • teams
  • repos
  • developers

It also deduplicates author identities and filter suspicious commits to give a better picture.

kszucs
  • 130
  • 3