90

I'm looking for an efficient way to get the list of unique commit authors for an SVN repository as a whole, or for a given resource path. I haven't been able to find an SVN command specifically for this (and don't expect one) but I'm hoping there may be a better way that what I've tried so far in Terminal (on OS X):

svn log --quiet | grep "^r" | awk '{print $3}'

svn log --quiet --xml | grep author | sed -E "s:</?author>::g"

Either of these will give me one author name per line, but they both require filtering out a fair amount of extra information. They also don't handle duplicates of the same author name, so for lots of commits by few authors, there's tons of redundancy flowing over the wire. More often than not I just want to see the unique author usernames. (It actually might be handy to infer the commit count for each author on occasion, but even in these cases it would be better if the aggregated data were sent across instead.)

I'm generally working with client-only access, so svnadmin commands are less useful, but if necessary, I might be able to ask a special favor of the repository admin if strictly necessary or much more efficient. The repositories I'm working with have tens of thousands of commits and many active users, and I don't want to inconvenience anyone.

Machavity
  • 30,841
  • 27
  • 92
  • 100
Quinn Taylor
  • 44,553
  • 16
  • 113
  • 131
  • 6
    Subversion does not *index* author names (they're just a revision property), so there is no way to do it without scanning the entire log; solutions will vary only by the cost per commit. – Kevin Reid Mar 22 '10 at 20:08

8 Answers8

103

To filter out duplicates, take your output and pipe through: sort | uniq. Thus:

svn log --quiet | grep "^r" | awk '{print $3}' | sort | uniq

I woud not be surprised if this is the way to do what you ask. Unix tools often expect the user to do fancy processing and analysis with other tools.

P.S. Come to think of it, you can merge the grep and awk...

svn log --quiet | awk '/^r/ {print $3}' | sort | uniq

P.P.S. Per Kevin Reid...

svn log --quiet | awk '/^r/ {print $3}' | sort -u

P3.S. Per kan, using the vertical bars instead of spaces as field separators, to properly handle names with spaces (also updated the Python examples)...

svn log --quiet | awk -F ' \\\\|' '/^r/ {print $2}' | sort -u

For more efficient, you could do a Perl one-liner. I don't know Perl that well, so I'd wind up doing it in Python:

#!/usr/bin/env python
import sys
authors = set()
for line in sys.stdin:
    if line[0] == 'r':
        authors.add(line.split('|')[1].strip())
for author in sorted(authors):
    print(author)

Or, if you wanted counts:

#!/usr/bin/env python
from __future__ import print_function # Python 2.6/2.7
import sys
authors = {}
for line in sys.stdin:
    if line[0] != 'r':
        continue
    author = line.split('|')[1].strip()
    authors.setdefault(author, 0)
    authors[author] += 1
for author in sorted(authors):
    print(author, authors[author])

Then you'd run:

svn log --quiet | ./authorfilter.py
Mike DeSimone
  • 41,631
  • 10
  • 72
  • 96
  • +1 for the useful suggestion. I was aware of `sort` but not `uniq`, and it seems the latter takes a `-c` parameter than prepends the number of occurrences for each line. I'm still hoping for a more efficient (and scalable) way, but this does the trick in a pinch. – Quinn Taylor Mar 22 '10 at 19:18
  • 2
    By the way, if you have XPath handy, then the query `//author/text()` will get just the author names out of `svn log --xml` robustly. (Mac OS X has an `xpath` command which *almost* does this job, but produces extraneous text and can't be configured not to. Maybe there's something else.) – Kevin Reid Mar 22 '10 at 20:07
  • @Kevin, you should add your own answer so people can vote for you. I like all your comments, particularly the sort/uniq tip. – Quinn Taylor Mar 22 '10 at 20:42
  • @ojblass: I don't ask many questions, but I still learn a lot on SO. I'm surprised some Perl ace hasn't posted a one-liner for this by now, though. – Mike DeSimone Jul 08 '13 at 16:35
  • 1
    As svn username could have spaces, it would be better to use more accurate filtering `awk -F " \\\\| " '{print $2}'` – kan Sep 16 '15 at 16:35
  • @kan Could you give an example of how usernames with spaces appear in the output? I'd need to update the other examples to handle that, too. – Mike DeSimone Sep 17 '15 at 14:01
  • Just appears as is, with spaces, nothing fancy: `r114502 | Full Name | 2015-08-24 18:05:58 +0100 (Mon, 24 Aug 2015) | 1 line` – kan Sep 17 '15 at 14:27
  • 2
    great answer, though I had to change the last of the awk's to `svn log --quiet | awk -F ' \\\\| ' '/^r/ {print $3}' | sort -u` otherwise I was just getting empty line – MJar Sep 15 '16 at 12:38
  • It's been years and they might have changed the log output format. I don't have any SVN repos handy to test with any more... – Mike DeSimone Sep 23 '16 at 00:17
54

In PowerShell, set your location to the working copy and use this command.

svn.exe log --quiet |
? { $_ -notlike '-*' } |
% { ($_ -split ' \| ')[1] } |
Sort -Unique

The output format of svn.exe log --quiet looks like this:

r20209 | tinkywinky | 2013-12-05 08:56:29 +0000 (Thu, 05 Dec 2013)
------------------------------------------------------------------------
r20208 | dispy | 2013-12-04 16:33:53 +0000 (Wed, 04 Dec 2013)
------------------------------------------------------------------------
r20207 | lala | 2013-12-04 16:28:15 +0000 (Wed, 04 Dec 2013)
------------------------------------------------------------------------
r20206 | po | 2013-12-04 14:34:32 +0000 (Wed, 04 Dec 2013)
------------------------------------------------------------------------
r20205 | tinkywinky | 2013-12-04 14:07:54 +0000 (Wed, 04 Dec 2013)

Filter out the horizontal rules with ? { $_ -notlike '-*' }.

r20209 | tinkywinky | 2013-12-05 08:56:29 +0000 (Thu, 05 Dec 2013)
r20208 | dispy | 2013-12-04 16:33:53 +0000 (Wed, 04 Dec 2013)
r20207 | lala | 2013-12-04 16:28:15 +0000 (Wed, 04 Dec 2013)
r20206 | po | 2013-12-04 14:34:32 +0000 (Wed, 04 Dec 2013)
r20205 | tinkywinky | 2013-12-04 14:07:54 +0000 (Wed, 04 Dec 2013)

Split by ' \| ' to turn a record into an array.

$ 'r20209 | tinkywinky | 2013-12-05 08:56:29 +0000 (Thu, 05 Dec 2013)' -split ' \| '
r20209
tinkywinky
2013-12-05 08:56:29 +0000 (Thu, 05 Dec 2013)

The second element is the name.

Make an array of each line and select the second element with % { ($_ -split ' \| ')[1] }.

tinkywinky
dispy
lala
po
tinkywinky

Return unique occurrences with Sort -Unique. This sorts the output as a side effect.

dispy
lala
po
tinkywinky
Iain Samuel McLean Elder
  • 19,791
  • 12
  • 64
  • 80
  • 1
    The `Sort -Unique` is case insensitive, you should use `Sort-Object | Get-Unique –AsString` or `Select-Object -Unique` instead to get a case sensitive check. – Tom Kuijsten Sep 15 '15 at 12:26
  • 2
    Alternatively: `([xml](svn log --xml)).SelectNodes('//author') | % {$_.InnerText} | Select -Unique` – Nathan Moinvaziri Jul 01 '17 at 23:19
10

I had to do this in Windows, so I used the Windows port of Super Sed ( http://www.pement.org/sed/ ) - and replaced the AWK & GREP commands:

svn log --quiet --xml | sed -n -e "s/<\/\?author>//g" -e "/[<>]/!p" | sort | sed "$!N; /^\(.*\)\n\1$/!P; D" > USERS.txt

This uses windows "sort" that might not be present on all machines.

Adam Rofer
  • 101
  • 1
  • 2
  • I've also made a batch file that iterates through a folder and compiles a unique list of all repositories: http://pastebin.com/CXiqLddp – Adam Rofer Nov 17 '10 at 23:50
4

One a remote repository you can use:

 svn log --quiet https://url/svn/project/ | grep "^r" | awk '{print $3}' | sort | uniq
lvthillo
  • 28,263
  • 13
  • 94
  • 127
  • I didn't find this command till I figured it out by myself... If you just want to get the users of a remote repository to e.g. convert it to git (see `git svn --help`) this is really useful as a checkout only to execute this command can take way too much time. – seyfahni Apr 15 '20 at 14:53
2
svn log  path-to-repo | grep '^r' | grep '|' | awk '{print $3}' | sort | uniq > committers.txt

This command has the additional grep '|' that eliminates false values. Otherwise, Random commits starting with 'r' get included and thus words from commit messages get returned.

Deepak Ingole
  • 14,912
  • 10
  • 47
  • 79
crankparty
  • 1,230
  • 3
  • 17
  • 26
  • thats why the `--quiet` or `-q` argument is used in the other suggestions. This only prints the log headers (revision, author and date, time) – v01pe Apr 08 '13 at 14:20
1

Powershell has support for XML which eliminates the need for parsing string output.

Here's a quick script I used on a mac to get a unique list of users across multiple repositories.

#!/usr/bin/env pwsh

$repos = @(
    'Common/'
    'Database/'
    'Integration/'
    'Reporting/'
    'Tools/'
    'Web/'
    'Webservices/'
)

foreach ($repo in $repos) {
    $url = "https://svn.example.com:8443/svn/$repo"
    $users += ([Xml](svn log $url --xml)).log.logentry.author | Sort-Object -Unique
}

$users | Sort-Object -Unique
Jason C
  • 142
  • 3
1

A solution for windows 10.

  1. create a batch file printAllAuthor.bat
@echo off
for /f "tokens=3" %%a in ('svn log --quiet ^|findstr /r "^r"') do echo %%a
@echo on
  1. run bat file with sort command
printAllAuthor.bat | sort /unique >author.txt

PS:

  • The step 2 need run the batch file with right path. either set path in %PATH% or use the right OS path format.
  • The step 2 can be made into a batch file as well according to your needs.
caoglish
  • 1,343
  • 3
  • 19
  • 29
-2

A simpler alternative:

find . -name "*cpp" -exec svn log -q {} \;|grep -v "\-\-"|cut -d "|" -f 2|sort|uniq -c|sort -n
Venki
  • 417
  • 4
  • 7