1

I need a regex to match only code lines that are not commented.

For example:

// Messagebox(a,b,c, mb_ok);
Messagebox(a,b,c, mb_ok|mb_taskmodal);
Messagebox(a,b,c, mb_ok);

I have regex to match 1and 3. Now I want to filter out the commented lines too. Final regex should match only line 3. Is that possible only with regex?

Regex used to match 1 and3: https://regexr.com/51mjh

(MessageBox\(\s*.*,.*,.*,\s*)(?!.*MB_TASKMODAL)

Conditions:

a. MessageBox API that takes 4 parameters.

b. Last parameter should not contain MB_TASKMODAL.

c. The line should not be one commented out.

Ron
  • 24,175
  • 8
  • 56
  • 97
  • What regex flavor? Do you have the option to exclude lines via regex (e.g. `grep -v`) or only include lines? – 0x5453 Apr 02 '20 at 13:46

2 Answers2

1

You avoid //-commented lines with:

^(?!\/\/).*

Explanation (also at regex101):

  • ^ Start of a line
  • (?!\/\/) Not a literal leading // (this is a negative lookahead)
  • .* Any number of any character (to skip blank lines, change to .+)

If you're worried about leading white space, use ^(?!\s*\/\/).* instead.


Now to get to the part about matching only line three.

If you want to match more than the absence of a comment, change the .* to what you want to match. I'm not exactly sure what you want to match and not match, so this is a guess based on what intentions I can glean from your attempt.

^(?!\/\/).*\bMessagebox\((?!.*\bmb_taskmodal\b).*

This has the aforementioned exclusion for commented lines, then it matches Messagebox( following a non-word character (or nothing) except if it is eventually followed by mb_taskmodal as a full word, then anything else.

I'm using \b a bit here. That just means exactly one side (either before or after the \b) has a word character (a letter, number, or underscore) and the other side has a non-word character. The "b" stands for "[word] boundary". Escaped non-word characters are always literals, so \( and \/ are a literal ( and / respectively.

Note that this regex will still match Messagebox(a,b,c, mb_ok); // |mb_taskmodal);. Resolving that is nontrivial since the inline comment indicator is two characters. I can answer that too, but hopefully you don't need it.


Solutions with grep:

$ grep -v '^//' FILENAME                            # discard comments
$ grep -v '^//' FILENAME |grep -vFw 'mb_taskmodal'  # also discard mb_taskmodal

Grep's -v flag inverts the match. -F disables regexes and uses a plain text match (faster), and -w requires word boundaries around the query (the same as \bmb_taskmodal\b assuming GNU grep without -F).


Extended Regular Expression (ERE) comment-filtering solution (no lookaround):

(If you're using grep, consider grep -v '^//' FILENAME instead)

^(.?$|[^\/]{2}|[^\/]\/|\/[^\/]).*

Explanation (also at regex101):

  • ^ Start of a line
  • (…) Capture group (PCRE can inhibit capturing with (?:…) instead) containing either
    • Alternation one
      • .? Any character, zero or one time (change to ..? to skip blank lines)
      • $ End of line
    • Alternation two
      • [^\/]{2} Any character except a /, twice
    • Alternation three
      • [^\/] Any character except a /
      • \/ A literal /
    • Alternation four (order is swapped)
      • \/ A literal /
      • [^\/] Any character except a /
  • .* Any number of any character (including zero, required by alternation one)

This will match a blank line or a line like / or j or a longer non-comment line.

Adam Katz
  • 14,455
  • 5
  • 68
  • 83
0

You could shorten the pattern a bit and make use of a negated character class [^,] matching any char except a comma, followed by matching the comma. If you group that part, it can be repeated 3 times using a quantifier {3}

If the capturing group is the only match, you can also omit it.

As already answered, you could use a negative lookahead to check if the line does not start with // and perhaps add \s* if there can be possible whitespace chars before it or use [\S\r\n]* to match whitespace chars without the newlines.

Note that there are also other ways to add comments, which do not have to occur at the start of the line.

^(?![^\S\r\n]*//)[^\S\r\n]*MessageBox\(\s*(?:[^,]*,){3}\s*(?!.*MB_TASKMODAL)

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • There is no difference between `^(?!\s*//)MessageBox` and `^MessageBox` since `(?!…)` is zero-width; it says "the next characters cannot be 0+ spaces and then two slashes" but then you specify the next characters must be "MessageBox" which already prevents the comment from matching. – Adam Katz Apr 02 '20 at 16:32
  • @AdamKatz Ah yes of course :-), I forgot to add it before the match itself. Updated. – The fourth bird Apr 02 '20 at 16:47