4

I'm being faced with the task of generating statistics about the history of a Git project, and I need to produce some specific numbers and representations for various metrics - things like commits per author, commits-over-time/date histograms, that sort of thing.

The trouble is that I need all this data generated in a format that can be dealt with via a script or similar - the output has to be text, and if I can get the numbers into a Python (or similar) script, so much the better.

My question is this: are there any existing frameworks or projects that will provide such an interface? I've seen GitStats, and it does a lot of what I want, but then it dumps the results into a HTML structure instead of just providing textual or programmatic representations back to me. Are there (for example) Python bindings for a Git log parser, or even a Git statistics generator that returns a big text dump of data?

I realize it's a very specific need, and I'm willing to do some serious coding to get the precise format I want, but I'd like to think there's a starting point out there somewhere. Ideas?

Tim
  • 59,527
  • 19
  • 156
  • 165
  • 1
    It seems like the right approach might be to try and make GitStats produce the output format you want. It happens to already be written in Python, too. There's an HTMLReportCreator in there, ~550 lines of code, but you could just drop in a replacement for that, or possibly even just grab the data structure that it's passed. `def create(self, data, path)`. Is there any reason this wouldn't be good for you? – Cascabel Jan 12 '11 at 18:24
  • Jefromi: it's certainly possible. I looked at it, and it appears that `data` is a GitDataCollector instance (a custom class internal to the project), not a dictionary or other Python data structure. Still, it's a great start. Thanks for the pointer! – Tim Jan 13 '11 at 17:05
  • Jefromi: after more consideration, I've started developing my own library, but if you'll post your comment as an answer I'll accept it - it's the thing that got me thinking the most about what's the best solution to this issue. – Tim Jan 21 '11 at 08:45

1 Answers1

1

How about using XML logs instead, and then you can parse the xml in python relativily easily and build your stats

see this answer for how to get an xml log from git

Community
  • 1
  • 1
hhafez
  • 38,949
  • 39
  • 113
  • 143
  • XML logs would be great, but that answer has no info about how to get an XML log out of Git - I believe the answerer there mistook that format string (from another answer) for XML when in fact it's just a personal preferred format from another user: http://stackoverflow.com/questions/1441156/git-how-to-save-a-preset-git-log-format – Tim Jan 12 '11 at 15:55