1

I want to select the word "hazardous" only if it is a separate word and not with "non " or "non-"before it.

eg:

non-hazardous

non hazardous

hazardous

non agricultural hazardous

regex 1: ^(?!non[-/s]?)hazardous$
regex 2: ^(?!non-|non\s)hazardous$

I tried the above two regex and it gave correct results for the first 3 sentences, but it's not selecting hazardous in 4th sentence. I want to select hazardous in 4th sentence as it doesn't have "non " or "non-" before it

Reference: Regular Expression - Match pattern that does not contain a string

martineau
  • 119,623
  • 25
  • 170
  • 301
ConMan77
  • 55
  • 6

1 Answers1

2

You can use

r'\b(?<!\bnon[-\s])hazardous\b'

See the regex demo. The pattern matches

  • \b - a word boundary
  • (?<!\bnon[-\s]) - a negative lookbehind that fails the match if there is non- or non and a whitespace immediately to the left of the current location
  • hazardous - a string
  • \b - a word boundary.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • why is there a word boundary in front of everything? Can you explain in detail – ConMan77 Mar 10 '22 at 18:27
  • @ConMan77 In order to [match the whole word](https://stackoverflow.com/q/1324676/3832970). – Wiktor Stribiżew Mar 10 '22 at 18:27
  • (?<!\bnon-|non\s)hazardous\b can this be used ? – ConMan77 Mar 10 '22 at 18:28
  • 1
    @ConMan77 Yes if it works for you, but the alternation inside an unanchored lookbehind is resource consuming. Using character classes is more efficient when possible, – Wiktor Stribiżew Mar 10 '22 at 18:30
  • I often see `\s` used in place of a space where the only relevant whitespace character is a space. Your regex does not match `"One is non\nhazardous"`, but arguably should. That's a contrived example, of course (and here it's hard to see how the use of `\s` could cause a problem), but it seems to me that, as a rule, one should always use a space rather than `\s` when a space is the only whitespace character that is to be matched or not matched. Your opinion? – Cary Swoveland Mar 10 '22 at 20:25
  • @CarySwoveland What about non-breaking spaces? The `\x20` won't match them. `\s` is default for any space in the regex world. Unless it is stated that only a literal space is meant, I'd use `\s`. – Wiktor Stribiżew Mar 10 '22 at 20:41