1

So I'm in a situation where I must use only regex to select everything but a specific word. For the purposes of example, the word will be foobar. This is an example of what should happen:

this should be highlighted, and
same with this. but any sentence
that has the word
foobar
shouldnt be, and same for any regular
sentence with foobar <-- like that
foobar beginning a sentence should invalidate
the entire sentence, same with at the end foobar
only foobar, and nothing else of the sentence
more words here more irrelevant stuff to highlight
and nothing of the key word
what about multiple foobar on the same foobar line?

And what should be matched, would look something like this:

match_highlighted.png

The best I could get is /\b(?!foobar)[^\n]+\n?/g which works if the word foobar is alone on it's own separate line formatted like this:

not foobar
foobar (ignored)
totallynotfoobar
nobar
foobutts
foobar (ignored)
notagain

And the rest is matched... but this is not what I want.

So my question is, how would I accomplish the original example? Is it even possible?

Izzy
  • 272
  • 1
  • 14

1 Answers1

4

Here's one way: (demo)

\W*\b(?!foobar).+?\b\W*

The ? in .+? is to ensure we stop matching as soon as we get a \b, otherwise we might skip over some foobar's.

The \W*'s are necessary to consume any leading or trailing non-word characters in the string.

Every word and every word separator are matched separately here, which might not be ideal.


Full explanation:

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  \W*                      non-word characters (all but a-z, A-Z, 0-
                           9, _) (0 or more times (matching the most
                           amount possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    foobar                   'foobar'
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  .+?                      any character except \n (1 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  \W*                      non-word characters (all but a-z, A-Z, 0-
                           9, _) (0 or more times (matching the most
                           amount possible))

A variation with look-behind and look-ahead: (with /gs or /gm) (demo)

(?<=^|\bfoobar\b)(?!foobar\b)(.*?)(?=\bfoobar\b|$)

I believe all those \b's are necessary to correctly handle all cases where foobar appears as part of a word (if it as part of a word should also be excluded, just removing all \b's should work).

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138