I am trying to extract (source code line, author label) pair from git repositories. The easiest way to do that is using git blame. The problem is that git blame takes the last committer as the author no matter whether the committer just indents the code or really changes the code. Do you know any method to it better?
Or maybe before trying to solve the problem, I should first check how many source lines are associated with multiple authors. If the percentage is small, there is no need to worry about it. But I find even counting the number is difficult. For a commit with a single parent, how can we know that the commit changed a line rather deleted a line and added a lined? For a commit with two parents (like a merge), how should I combine the diff results from the two branch?
Thanks