1

I want to visualize the statistics of commits in our project. I would like to classify them into several groups based on some metrics, like commit messages or the number of changed lines of code.

Is there any systematic approach I can use? I have trouble with the selection of right keywords for commit message classification and with the selection of proper thresholds for size classification.

B--rian
  • 5,578
  • 10
  • 38
  • 89
MacakM
  • 1,804
  • 3
  • 23
  • 46
  • Are you asking for technical assistance (e.g., how do I get this data out of Git?) or statistical approaches (e.g., how do I distinguish a “small” commit from a “large” one?)? – bk2204 Feb 08 '20 at 00:30
  • 1
    I ask for statistical approaches – MacakM Feb 08 '20 at 07:38

1 Answers1

0
  1. If you aim for an analysis of the commit messages only, you probably want to use Natural Language Processing (NLP) tools. A good starting point might be the book Tidy Text Mining. This resource is for R but offers a concise introduction if you are not yet familiar with the terms Term Frequencies (TF) or inverse document frequencies (tdf). You would start with a simple histogram of words, but to be able to make statements based on that, you have to customize stopwords and probably do a lot of other pre-processing of like word stemming.

  2. If you are interested in general metrics of your GIT project (not limited to commit-messages), I recommend to have a look at Silvio Montanari's Code-Forensics project:

    code-forensics is a toolset for analysing codebases stored in a version control system. It leverages the repository logs, or version history data, to perform deep analyses with regards to complexity, logical coupling, authors coupling and to inspect the evolution in time of different parts of a software system with respect to metrics like code churn and number of revisions.

  3. Valuable information about a commit are already hidden in the GIT tags in case they are used in your project. For a start, you could try git log --graph --oneline --simplify-by-decoration which gives you a GIT tag tree.

Disclaimer: I am not affiliated with the two mentioned website/project, but I did indeed ask the linked SO-question.

Community
  • 1
  • 1
B--rian
  • 5,578
  • 10
  • 38
  • 89
  • @MacakM I edited my answer in order to clarify, I hope it is somehow helpful for you. If so, I would not mind a feedback, if not, a comment would be nice. – B--rian Feb 13 '20 at 08:08