Regular Expression to remove annotation code

Question

I'm new to regex.

I try to remove unused code in project like

/*
    // random unmanaged annotation
    foo = var;
    doSomething();
    multilineFunction(a,
                      b);
*/

and leave "not code" annotation

/*
     real annotation
*/

I try to find and replace with regular expresion that "inside between /* and */ contains line endwith ;" but It's doesn't work with my regex. How make that regex?

I tried inside of /* */ by (/\*)(.*\n)*?(.*\*/), and cotains line endswith ; (/\*)(.*\n)*?(.*;\n)(.*\n)(.*\*/) but this regex find last match of */ and maybe dirty.

Edit: I wanted to do this only as replacement function in IDE. I solved it now by writing python code, but I'm still curious.

@Sweeper I assumed it was not a code comment if there was no semicolon inside the comment. — kdw9502, Dec 17 '19 at 05:28
Regex is fundamentally the wrong tool for this. The following is about HTML but the fundamental reasoning is the same for any context-free language: https://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not — tripleee, Dec 17 '19 at 05:28
@tripleee this is good but can't filter contains semicolone. — kdw9502, Dec 17 '19 at 05:38
I would go with something like https://github.com/eliben/pycparser instead. — tripleee, Dec 17 '19 at 05:39

score 0 · Accepted Answer · answered Dec 17 '19 at 06:20

0

As already explained in the comments, it is better to use a good parser.

For a one-time hack, you can use the following regex:

\/\*[^;]*;.*?\*\/

Test here.

Assumptions for it to work without issues: - the code is proper C and it contains semi-colons ; - the annotations do NOT contain semi-colons ;.

If the assumptions are not fulfilled, you need to do some additional hacks, or to use a proper parser, as stated initially.

answered Dec 17 '19 at 06:20

virolino

2,073
5
21

this is not work for my IDE(Rider) but work find in Test. – kdw9502 Dec 17 '19 at 07:03

score 0 · Answer 2 · answered Dec 17 '19 at 06:27

You could use \/\*\s+(\/\/.+)[\s\S]+\*\/

Explanation:

\/\* - match \* literally

\s+ - match one or more whitespaces (including newline character)

(\/\/.+) - match line beginning with \\, i.e. comment (but only one line) and store it in first capturing group

[\s\S]+ - match one or more of any characters (\s is whitespace and \S is non-whitespace

\*\/ - match *\ literally

Demo

Then replace it with first capturig group.

Note that it will only work with one ilne comments placed at the beginnng if a commented block

Regular Expression to remove annotation code

2 Answers2