1

I am trying to find all VBA comments using regular expressions. I have something that mostly works, but there are a few exceptions that I cannot figure out.

Expression I am using:

'(?!.*").*

Take our test code:

Working - This is a test 'This should be captured
Working - "this is a test" 'This should be captured
Not Working - "this is a test" 'This should be "captured"
Not Working - This is a test 'This should be "captured"
Working - "this is a test 'this should not capture'" 'this should capture
Working - "this isn't a test" 'this should capture

Here is a link to this example in RegExr: http://regexr.com/3f24h

For some reason that third and fourth examples are not capturing. The problem seems to be with having a string value in the comments and I cannot figure out how to fix it.

Any advice?

Chad
  • 13
  • 5

3 Answers3

5

You can't find all of the comments (let alone string literals) in VBA code with regular expressions - period. Trust me, I tried during work on the Smart Indenter module of Rubberduck (in case that wasn't explicit enough - full disclosure, I'm a contributor). You'll need to actually parse the code. The first issue that you'll run into are line continuations:

'Comment with a line _
continuation

Debug.Print 'End of line comment _
with line continuation.

Debug.Print 'Multiple line continuation operators _ _
still work.

Debug.Print 'This is actually *not* a line continuation_
Debug.Print 42

This makes it difficult to identify string literals, especially you're using line-by-line processing:

Debug.Print 42 'The next line... _
"...is not a string literal"

You also have to handle the old Rem comment syntax...

Rem old school comment

...which also support line continuations:

Rem old school comment with line _
continuation.

You might be thinking "that can't be so bad, Rem has to start a line". If you are, you forgot about the statement separator (:)...

Debug.Print 42: Rem statement separator comment.

...or its evil twin the statement separator combined with a line continuation:

Debug.Print 42: Rem this can be _
continued too.

You covered a couple of the issues with sorting out string literals and comments like these...

Debug.Print "Unmatched double quotes." 'Comment"
Debug.Print "Interleaved single 'n double quotes." 'Comment"

...but what about bracketed identifiers like this beast (courtesy of @ThunderFrame)?

'No comments or strings in the line below.
Debug.Print [Evil:""Comment"'here] 

Note that the syntax highlighter SO uses doesn't even catch all of these bizarre corner cases.

Comintern
  • 21,855
  • 5
  • 33
  • 80
  • How does the VBE syntax highligher catches all of them? – Vityata Jan 12 '17 at 15:38
  • 2
    @Vityata - The VBE syntax highlighter doesn't use regex - it parses the code. – Comintern Jan 12 '17 at 15:39
  • Something I thought immediately after writing my comment. Btw, in your last example with the `Evil`, are you missing `"`? – Vityata Jan 12 '17 at 15:40
  • @Vityata - Anything inside a bracketed identifier is treated as part of the identifier itself - the context of the characters themselves switches inside the brackets. – Comintern Jan 12 '17 at 15:43
  • Identifier as a named range? Or something else? – Vityata Jan 12 '17 at 15:52
  • 3
    @Vityata - In Excel, it's treated as an expression (so you can use it for named ranges). It can also be used as a COM member call - i.e. `ws.[_CheckSpelling]`. I doubt that you'd run into anything in COM with a member name containing quotes, but it may be feasible in that objects are free to implement `GetIDsOfNames` however they want. – Comintern Jan 12 '17 at 16:01
2

Maybe something like

^(?:[^"'\n]*("(?:[^"\n]|"")*"))*[^"]*'(.*)$

It handles multiple quoted strings, as well as strings having quoted (double) "'s (which I believe is VBA's way).

(I guarantee it will fail in some cases, but probably will work in most ;)

Check it out here at regex101.

Edit

Added some of Comintern's examples and adjusted the regex. It still can't handle the bracketed identifiers though (which I don't even know what it means :S See the last line). But it now handles his continued line comments.

^(?:[^"'\n]*(?:"(?:[^"\n]|"")*"))*[^']*('(?:_\n|.)*)

Check it out here at regex101.

SamWhan
  • 8,296
  • 1
  • 18
  • 45
  • One small tweak I'd suggest - the line continuation operator is only treated as a line continuation if it is preceded by a single space: `\s_\n`. I added an example at the bottom of my top code block (the one with broken syntax highlighting...). – Comintern Jan 12 '17 at 15:52
  • "Just doing my masters bidding" ;) – SamWhan Jan 12 '17 at 18:00
0

This should work:

("[^"]+"\s)?'.+

Tested here: https://regex101.com/r/dd60QS/1

Nick Allan
  • 387
  • 4
  • 10