-1

So I have this simple regular expression here:


([a-zA-Z_0-9]+)

However, I want it to not capture it if it equals a specific word, so I found a workaround like this, where it should not contain this:


([a-zA-Z_0-9]+)?\b(?<!this)

Which is fine in retrospect, but if I have the instance where I have a word like this_that, it won't capture it because THIS is inside of that word, however, I want it to capture it unless it equals EXACTLY this and nothing else, make sense?

I did some research and I could not find a NOT operator in regex, just negative look aheads. Anyone have any ideas?

Lots of stuff, as stated above.

1 Answers1

0

If you want to capture words unless the word is this you can do:

\bthis\b|(\w+)  # or [a-zA-Z_0-9]+ for ASCII only

Demo

Alternatively, you can do:

\b(?!this\b)\w+ # or [a-zA-Z_0-9]+ for ASCII only

Demo


BTW: [a-zA-Z_0-9] is the same as \w in PCRE, Ruby, Java, ECMA, and other re flavors where \w is a shortcut for legal variable names in many computer languages.

But \w matches àbç in Python, Rust, C#, and others where the Unicode sense of a letter is used.

dawg
  • 98,345
  • 23
  • 131
  • 206
  • 1
    Nitpick: In modern Python, `\w` is default Unicode friendly, so it will capture non-ASCII alphabetic characters, where the explicit character class will not. They're only equivalent when working with `bytes` patterns, or when the `re.A`/`re.ASCII` flag modifies the regex. – ShadowRanger Aug 31 '23 at 22:50
  • 1
    Further nitpick: Python's implementation of `\w` is very different from that of Unicode's definition of `\w` and does not recognise many word forming characters. For more conformant Unicode definition of `\w` you need to use the `regex` module instead of `re`. – Andj Sep 01 '23 at 22:16