0

I would like to know how many additions and deletions where done to a set of files over the whole lifetime of a repository regardless of the author or commit.

Is there a way to get that information from git? Or is there a way to get this information by using git commands in association with some shell magic?

Henkk
  • 609
  • 6
  • 18
  • I don't know if there is special command but you could use `git log --all --oneline` and parse through the output. – legionth Aug 10 '16 at 09:59
  • This problem might be a bit underspecified. For instance, suppose I make a new repository and add all the files. Then I check out an orphan branch and add all the files again. Then I merge the two branches (which changes no lines in any files). If there is only one file `F` and it has five lines, have I added 5 lines, or 10 lines? – torek Aug 10 '16 at 09:59
  • @torek As you said the merge changes no lines an any files. So there should be 5 added lines on each branch. Am I getting this wrong? – Henkk Aug 10 '16 at 10:21
  • @legionth `git log --all --oneline` returns the commit hashes and messages only. – Henkk Aug 10 '16 at 10:27
  • So, is that ten lines changed? We're doing sum(delta(k,p) \forall p \elem parents(k) \forall k \elem commits(repository))? (where delta(k, _) when k is a root commit = diff against null tree) – torek Aug 10 '16 at 10:33
  • @torek Maybe my approach is a bit too naive here. But I just want to know: \sum(\delta(c, s) \forall c \elem commits(b) \forall b \elem branches(repository) \where s = successor(k)). (And the intuitive base case) This is not really a correct metric to measure the work (real changes) that was done to a repository. But for that we would have to consider more cases like: the same 5 lines where changed in 3 commits and where changed back to the original state afterwards. Is that a change? If you've got a suggestion how this should be handled you're welcome to propose that. – Henkk Aug 10 '16 at 10:56
  • @torek When there is still some misunderstanding on my side please correct me. Or suggest a solution to the specified version of your interpretation of my question. – Henkk Aug 10 '16 at 11:11
  • There's a problem (in Git anyway) with the formulation of \forall c \elem commits(b) \forall b \elem branches(repo), because each commit is usually on many branches. So this would count commits multiple times. Remember that branch names are just pointers into the overall DAG, and merge commits have two or more parents, where one or both parents often have branch names pointing to them. Other than that I think you can script this now, using `git rev-list --branches` to find all commits reachable from branches, and looking at their parent IDs. – torek Aug 10 '16 at 11:17

2 Answers2

0

Here's a stab at a starting point, though there's probably more to do:

git rev-list --branches --parents |
while read hash parents; do
    # $hash is a commit; it has $parents as its parents
    set -- $parents
    for p do    # loop over all of $hash's parents
        git diff $p $hash --stat -- $pathlimiters
    done
done | awk '/files changed, / { print }'

The output will have many lines of the form:

 2 files changed, 10 insertions(+), 1 deletion(-)
 3 files changed, 924 insertions(+), 550 deletions(-)

Modify the awk code (or write something in whatever language you prefer) to find the insertions and deletions counts and sum them up.

You also probably need to add a special case for root commits (when $parents is empty) where you diff against the empty tree.

Community
  • 1
  • 1
torek
  • 448,244
  • 59
  • 642
  • 775
0

I use this little handy bash method for counting git stats (Commits, Additons, Deletions):

gitstats() {
author="${1:-DEFAULT AUTHOR}"
echo "Author: $author"
echo "Commits: $(git rev-list HEAD --author=$author --count)"
echo "Additions and Deletions: $(git log --author=$author --pretty=tformat: --numstat | grep -v '^-' | awk '{ add+=$1; rem$
} END { print add,
remove }')" 
}

which can be called like: gitstats "author name"

gustavz
  • 2,964
  • 3
  • 25
  • 47