-1

Fellow coders!

I need a way for regex to recognize all \w word characters that are NOT located inside a single-line comment.

In my instance, I am using Asciidoc, and single-line comments begin with // at the start of a line.

To try to figure it out, I'm using regex101.com with PHP's flavor of regex.

The example text I'm using is:

foo bar baz
//bla ble blu
// mee maa moo

I need regex to return: f,o,o,b,a,r,b,a,z and ignore the rest.

I figured I should work with lookaheads and lookbehinds, but the exact formulation eludes me big time.

The best I could come up with was (?<!^\/\/.*)\w, however, it does not match all the chars I need.

Any ideas?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Jure T
  • 199
  • 1
  • 8
  • What do you mean, it behaves strange? What is the undesirable behavior that results in, when only single line comments are considered? – CertainPerformance Feb 01 '20 at 10:27
  • At first it updated the word count only when the file was first opened. Now, seeing your reply, I restarted everything and now, it still doesn't ignore single-line comments. In other words, I'm nowhere. – Jure T Feb 01 '20 at 13:01
  • 2
    To me [it's not clear](http://idownvotedbecau.se/unclearquestion) what is your input, what is your desired output. Maybe you could tell us what to put in [https://regex101.com/](https://regex101.com/) and what do you want your regex to highlight. If I put your example text and example regex into regex101 with ECMAscript selected, I get the all the letters in the multi-line comment matched. You wrote _It still does not ignore single-line comments_. I'd say it's the other way around. – Enlico Feb 01 '20 at 15:55
  • You won't be able to solve this easily with lookbehind/lookahead because you can't easily tell the end of a multiline comment from the start of another one. – Wiktor Stribiżew Feb 01 '20 at 16:04
  • 1
    *I'm using regex101.com with PHP's flavor of regex* - WHY if you have a `visual-studio-code` tag? VSCode supports lookbehinds like modern JS does, `(?<!^//.*?)\w`. See https://regex101.com/r/tcTJVo/1 – Wiktor Stribiżew Feb 02 '20 at 01:19
  • Holy crap fellas. Why the hostility and downvoting. I don't know much about regex and that's why I asked for help. Isn't that's what this portal is for? I guess I couldn't figure it out because I should have been testing it in VSCode directly. And Wiktor: Thank you a lot. That did it. Mods, can you please let me mark Wiktor's answer as "the answer". – Jure T Feb 02 '20 at 11:27
  • You should not have removed the original regex attempt. I put it back into the question. Let's wait. If other users consider this question good, they will hit `reopen` link. – Wiktor Stribiżew Feb 02 '20 at 12:57
  • Got it. Thanks Wiktor! – Jure T Feb 02 '20 at 20:32

1 Answers1

1

You may use a lookbehind with a pattern that matches a string of unknown length in VSCode as its search and replace feature is based on the modern ECMAScript standard (see this thread).

Use

(?<!^//.*?)\w

See the regex demo.

Details

  • (?<!^//.*?) - a negative lookbehind that makes sure there is no // at the start of the string followed with any 0 or more chars other than line break chars, as few as possible, up to the closest...
  • \w - matches a letter, digit or _.

Test in VSCode:

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563