I have a bunch of MATLAB script/function files that I and the rest of my team need to work on. We have little to no idea what most of the files do, and little to no idea which ones belong together and which ones are separate. We do know we have a total of 36,000 lines. I'd like to know how many of those lines are comments.
Easy, right? Just count how many of them start with the comment start character %.
Well, no. I don't want to count blocks of code that have been commented out as "comments", since they don't actually tell me anything. And I'd prefer not to count "empty" lines used to make one comment line a "headline"
% %%%%%%%%
% headline
% %%%%%%%%
like so.
So how can I get a sensible estimate of how many lines of actual informative comments I have? Is there an easy way to distinguish natural language (possibly containing code snippets) from pure code?
Yes, I know code should be self-explanatory as far as is practical, but the code we have inherited clearly is not. Yes, I know we should probably refactor this mess. The purpose of figuring out how much comments we have is to highlight the technical debt we have here, so that we can allocate resources to this refactoring.