2

We have a large legacy code base with lots of code which is commented out and is now polluting our source code files. On the other hand we have real comments which I like to preserve.

Is there a way to find comments in C/C++ which are source code and remove them in order to clean up the code base?

Imagine the following code

// the function foo is doing foo
void foo(){
     // bar();
     bar2();
}

The old function bar() has been commented out and is no longer used. I like to have an automated way to remove the outdated source code but plain text comments should not be touched. Thus after the clean up the code would look like

// the function foo is doing foo
void foo(){
     bar2();
}

I found this and that to remove all comments. This is not what I like to do.

Can clang tidy do this job?

schorsch312
  • 5,553
  • 5
  • 28
  • 57
  • 3
    Which language? The grammar of C++ is considerably more complex than C which makes it even harder in the former. – Bathsheba Aug 23 '17 at 09:31
  • 3
    You need a parser for the C grammar to do this, at least. I doubt there's an entirely reliable way. –  Aug 23 '17 at 09:31
  • 1
    Question is, why do you need this? commened code is not compiled, anyway. Unless it's a huge chunk of code, there's not much point in removal. again, if it's a small block, a few lines, do it manually. – Sourav Ghosh Aug 23 '17 at 09:32
  • Have you tried any existing code "beautifiers" or reformatters that exist, for example [clang-format](https://clang.llvm.org/docs/ClangFormat.html)? They might be able to remove comments? ***However*** it would remove *all* comments. – Some programmer dude Aug 23 '17 at 09:32
  • "is there a way" - manually. They're commented out already, so just remove them as you find them. – UKMonkey Aug 23 '17 at 09:34
  • We have a large legacy code base with lots of code like the example above. I know that this is not a performance issue, but a readability one. – schorsch312 Aug 23 '17 at 09:34
  • 5
    You could use some *heuristic* regular expressions (line ending with `;`, line starting with `{`, etc) but this **will** have *false positives*. For the future, better use some VCS (like `git`) and never check in code with commented code lines... they're unnecessary with a proper history. –  Aug 23 '17 at 09:35
  • 2
    @FelixPalmen Unfortunately a lot of people do not get this even if a VCS is in use. – muXXmit2X Aug 23 '17 at 09:37
  • 6
    As some coments suggest I think a completely automated and 100% reliable tool is non-existent and very hard to create. My best suggestion is to write a shell-script or python-script that uses `awk`, `sed` or similar to do a regular expression parsing of all files and lines, and then print detected comment lines with some lines of context before asking for user permission to remove the comments in all cases. This will of course take some time, but to me sounds like the most feasible solution if parsing all files manually is not an option. – Hans Petter Taugbøl Kragset Aug 23 '17 at 09:38
  • @HansPetterTaugbølKragset all we need is a small shell –  Aug 23 '17 at 10:40
  • MISRA-C has a requirement stating that production code may not contain any code which was "commented out". Because of this, most MISRA-C checkers implemented a way to find "commented out code". So running such a tool might be a good way to spot such code. As for automatically removing it... well I'm not so sure I would want that, sounds like a scary feature. – Lundin Aug 23 '17 at 13:32

1 Answers1

0

This really scratched my mind, so,

I have written a small program in javascript real quick, which removes the lines that have // and ; in a text just to show.

You can write your own algorithm, and remove lines like that. For example, you can put @ sign to the lines you would like to get deleted and then run your program with it. You need to make your own algorithm or just a simple one like mine.

Here is my js code: it deletes the rows and logs the result to the console as an array for example. You should code a program like this or however you like.

<html>
<textarea id="txtArea" rows="40" cols="300">
// the function foo is doing foo
void foo(){
    // bar();
     bar2();
}
</textarea>

</html>
<script src="http://code.jquery.com/jquery-latest.min.js" type="text/javascript"></script>

<script>

var x = document.getElementById("txtArea").value;
var lines = x.split("\n");
var count = lines.length;

for(var i =0; i< lines.length; i++){

if(lines[i].indexOf("//")){

console.log("haha");

}else{

    if(lines[i].indexOf(";")){
        lines[i] = "";
    //remove row
    }else{
    }
}
}
document.getElementById("txtArea").value = lines;
var y;

console.log(lines);

</script>
  • As mentioned in the comments on the question, this *will* result in *false positives* - comments with `;` in them will be removed even though they may be desirable, real comments. – Hans Petter Taugbøl Kragset Aug 23 '17 at 11:00
  • This also produces false negatives. Lines such as `if (condition) {` and the corresponding `}` will not be removed. I guess it's less of a problem than deleting actual comments but I wanted to point it out in case you missed that. – patatahooligan Aug 23 '17 at 11:36
  • @patatahooligan i know that, i have written the code pretty quickly. in that case we'd wanna delete the lines that contain `//` and `if`, it's all about how you want it to be done –  Aug 23 '17 at 13:06