6

I can't figure out reverse negative lookup. Suppose I have a text

qwe  abc
qwe abc
abc

and I want to find all abc which is not going after qwe, which might be followed by any amount of spaces.

(?<!qwe)\s*?(abc)

Matches everything. I assumed it would be something like "match arbitrary amount of spaces followed by abc if there's no qwe in front of it"

I tried also

qwe|(abs) 

approach, but it does not work for me. Although groups are empty for the cases where I do not want match to work, I don't really get how do I use it with re.sub function (which need to). Even though groups are empty, re.sub does replace the string.

Env: python 3

Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
Roman
  • 1,396
  • 4
  • 15
  • 39
  • 1
    You could do it like this with the regex module instead of re `(?<!qwe\s*)abc` and place the `\s*` in the lookbehind. Else you can use `qwe\s*abc|(abc)` with a capture group. https://regex101.com/r/JNzeGi/1 – The fourth bird Apr 09 '21 at 13:25
  • 2
    You can use this module https://pypi.org/project/regex/ – The fourth bird Apr 09 '21 at 13:27
  • This would be a variable-length lookbehind, which is not supported. Otherwise @Thefourthbird is right, `(?<!qwe\s*)abc` would do the trick. Trying to come up with a workaround. – Tamas Rev Apr 09 '21 at 13:29
  • Another approach is https://regex101.com/r/JNzeGi/1 – The fourth bird Apr 09 '21 at 13:31
  • Here https://stackoverflow.com/q/31564195/187808 there was a comment by @jonrsharpe to use the `regex` module (instead of the `re`), so you'll have variable-length lookbehind. And then `(?<!qwe\s*)abc` will work. – Tamas Rev Apr 09 '21 at 13:34

3 Answers3

2

You don't need to use a lookbehind here. Just stick with a negative lookahead that allows dynamic length assertions:

^(?!.*qwe\s+abc).*abc

Or with word boundaries to make sure qwe and abc are complete words.

^(?!.*\bqwe\s+abc\b).*\babc\b

RegEx Demo

RegEx Explanation:

  • ^: Start
  • (?!.*qwe\s+abc): Negative lookahead to fail the match if we have qwe followed by 1+ whitespaces followed by abc is found anywhere in the line
  • .*: Match 0 or more of any characters
  • abc: Match abc
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • Just one thing I want to mention: As far as I understand, lookahead demands something to be matched before actual lookahead. E.g. (?<!qwe) did not seem to work, while ^(?<!qwe) did. Could you please clarify that? – Roman Apr 11 '21 at 12:57
  • 1
    Please understand that `(?<!...)` is a lookbehind and `(?!...)` is a lookahead. What I suggested in my answer is a `lookahead`. I would suggest [this very good tutorial on look-arounds](https://www.regular-expressions.info/lookaround.html) – anubhava Apr 11 '21 at 14:24
1

You can find an interesting article on "The Best Regex Trick" here where you would first have to match what you don't want using alternations. Then capture what you do want inside a capture group.

The syntax would be: MatchWhatYouDon'tWant|(MatchWhatYouDoWant). In your particular case we can use some extra syntax using word-boundaries and a non-capturing group to nest the alternation in:

\b(?:qwe\b\s+abc|(abc))\b

See the online demo

  • \b - Word-boundary.
  • (?: - Open non-capturing group:
    • qwe\b\s+abc - Match "qwe" literally followed by a word-boundary, 1+ whitespace characters and "abc".
    • | - Or:
    • (abc) - Match "abc" within the 1st capturing group.
    • ) - Close non-capturing group.
  • \b - Word-boundary.
JvdV
  • 70,606
  • 8
  • 39
  • 70
1

The reason you match abc in group 1 for all 3 examples, is that your pattern (?<!qwe)\s*?(abc) asserts at the current position that what is directly to the left is not qwe and then matches optional whitespace chars.

This assertion is true for the first 2 examples at the position after the space that follows qwe. The pattern can move to that position where the assertion is true, because it can match a whitespace char making the assertion true at that position.

The third example get a match as there is no qwe present at the left.

Note that for example there will be no match for qweabc as there is no room for a whitespace char to be matched making the assertion true.


re does not support variable length lookbehinds, but the PyPi regex module does.

(?<!qwe\s*)abc
  • (?<!qwe\s*) Positive lookbehind to assert that directly to the left is not qwe followed by optional whitespace chars.
  • abc Match literally (You don't need the group anymore)

Regex demo | Python demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70