17

While git-blame and counting number of lines changed by an author within a git repository are helpful, is there a command that can list all of the pathnames modified in a repo across all commits by either an author or set of authors that scores each file by the number of commits by that author or set of authors? E.g. the output from running such a command in a cloned git repo would be similar to:

1    /path/to/some/file/in/repo/file1
34   /path/to/some/file/in/repo/file2
3    /path/to/some/other/file/in/repo/anotherfile
...

Thanks!

Community
  • 1
  • 1
Gary S. Weaver
  • 7,966
  • 4
  • 37
  • 61
  • 3
    Are you going to make salary decisions? – Basilevs Sep 15 '14 at 16:18
  • :) No. Just wanted to identify parts of the code that could be focused on, and using # commits per file by author would be one way of identifying parts of the code to focus on for knowledge transfer when an employee is leaving. – Gary S. Weaver Sep 15 '14 at 21:10
  • Are you willing to write a batch file? If so, you could use `git rev-list HEAD --count --author=someDude -- somefile.txt` to create a count output. – Shaun Luttin Dec 09 '14 at 17:37

3 Answers3

23

Just realized that if you use --name-only to print the filenames, pretty format as empty string, and use this method to sort, uniq, and sort by top number of commits, in *nix/OS X, you could use:

git log --name-only --author=John --pretty=format: | sort | uniq -c | sort -nr

Be sure that you are using the right author.

E.g. if we were trying to find DHH's authors in Rails, we might do:

git log --format='%aN <%aE>' | LC_ALL='C' sort -u | grep avid

and notice that all of DHH's authors in the Rails git repo use the name "David Heinemeier Hansson". So, then we could do:

git log --name-only --author="David Heinemeier Hansson" --pretty=format: | sort | uniq -c | sort -nr

Which might output:

3624 
 611 actionpack/CHANGELOG
 432 activerecord/CHANGELOG
 329 railties/CHANGELOG
 206 activerecord/lib/active_record/base.rb
 195 activesupport/CHANGELOG
 157 actionpack/lib/action_controller/base.rb
 153 railties/Rakefile
 108 activerecord/lib/active_record/associations.rb
  79 actionpack/lib/action_view/helpers/javascript_helper.rb
  75 activerecord/lib/active_record/validations.rb
  74 activerecord/test/base_test.rb
  69 actionmailer/CHANGELOG
  66 railties/lib/rails_generator/generators/applications/app/app_generator.rb
  66 activerecord/Rakefile
  66 actionpack/lib/action_controller/caching.rb
  60 actionpack/lib/action_controller/routing.rb
  59 railties/lib/initializer.rb
  59 actionpack/Rakefile
  57 actionpack/lib/action_controller/request.rb
  ...

So, as of 2015-02-21, there were 3624 files in the Rails git repo that it appears he never personally made commits to, the top number of commits for a file was the ActionPack CHANGELOG at 611 commits, followed by the ActiveRecord CHANGELOG, and ActiveRecord::Base was the Ruby file he made the most commits to.

If you want to exclude the number of files not touched from the counts, use --format= instead of --pretty=format:, e.g.:

git log --name-only --author="David Heinemeier Hansson" --format: | sort | uniq -c | sort -nr
Community
  • 1
  • 1
Gary S. Weaver
  • 7,966
  • 4
  • 37
  • 61
3

Example with PowerShell

Display the commit count of the specified author for each file in the current working tree.

Short Form

$author = 'shaun';
dir -r | % { New-Object PSObject -Property `
@{ `
   Count = git rev-list HEAD --count --author=$author -- $_.Name; `
   FileName = $_.Name; `
}} `
| sort Count | % { $_.Count + ' -- ' + $_.FileName + ' -- ' + $author; }

Long Form

$author = 'shaun'; `
Get-ChildItem -recurse | ForEach-Object `
{ `
   New-Object PSObject -Property `
   @{ `
       Count = git rev-list HEAD --count --author=$author -- $_.Name; `
       FileName = $_.Name; `
    } `
} | ` 
Sort-Object Count | ForEach-Object ` 
{ ` 
   $_.Count + ' -- ' + $_.FileName + ' -- ' + $author; `
} 

Notes

  • ` means continue the command on a new line.
  • | means pipe the resultant objects to the next command.
  • $_.SomeProperty accesses a property from the piped in object.
  • you can copy/paste this directory into PowerShell, because the ` marks indicate a new line.
  • include filter-branch to also track previously deletes files and other branches.
  • include git log --format='%aN' | sort -u to iterate through all project authors

Output

0 -- blame.txt~ -- shaun
0 -- .blame.txt.un~ -- shaun
1 -- GitBook-GitTools-06-RewritingHistory.asc -- shaun
1 -- GitBook-GitTools-05-Searching.asc -- shaun
1 -- GitBook-GitTools-03-StashingAndCleaning.asc -- shaun
1 -- GitBook-GitTools-07-ResetDemystified.asc -- shaun
1 -- README.md -- shaun
1 -- LICENSE -- shaun
1 -- GitBook-GitTools-09-Rerere.asc -- shaun
1 -- GitBook-GitBranching-Rebasing.asc -- shaun
1 -- blame2.txt -- shaun
1 -- GitBook-GettingStarted-FirstTimeSetup.asc -- shaun
1 -- GitBook-GitTools-02-InteractiveStaging.asc -- shaun
1 -- GitBook-GitTools-01-RevisionSelection.asc -- shaun
1 -- GitBook-GitInternals-Maintenance.asc -- shaun
2 -- goals.asc -- shaun
2 -- GitBook-GitTools-10-Debugging.asc -- shaun
3 -- blame.txt -- shaun
6 -- GitBook-GitTools-08-AdvancedMerging.asc -- shaun
Shaun Luttin
  • 133,272
  • 81
  • 405
  • 467
  • 1
    How does this work? I'd like to tweak it to get commits per top-level directory in repo, or at least get the full path of each file. – Macke Feb 18 '16 at 13:02
  • I added a long form of the PowerShell for you. Let me know if you have further questions. – Shaun Luttin Feb 18 '16 at 17:05
0

I found it would be helpful by adding this git alias to .gitconfig:

# list commit counts by file
cc = "!cd ${GIT_PREFIX:-./}; git log --name-only --format= \"$@\" | sort | uniq -c | sort -nr | head -30 #"
# list commit counts by folder
ccf = "!cd ${GIT_PREFIX:-./}; git log --name-only --format= \"$@\" | rev | cut -d'/' -f2- | rev | sort | uniq -c | sort -nr | head -30 #"

And then you can use the same arguments as git log, e.g.

git cc --author=hank --since="1 year ago" -- path/to/some/folder
hankchiutw
  • 1,546
  • 1
  • 12
  • 15