2

I am trying to make a regular expression that matches a line when it has a number followed by a ok word.

Eg:

10 ok

But if there is a number and nok after the ok word, then it should not match. Eg:

10 ok    2 nok

I am using the following regular expression to achieve this:

[0-9]+\s+ok\s+(?!([0-9]+\s+nok))

I am using the 4th answer from Which regular expression operator means 'Don't' match this character? to generate a not functionality in my regex.

Here is my code:

import re

prog=re.compile('[0-9]+\s+ok\s+(?!([0-9]+\s+nok))')
result=prog.search('108601                  ABC_kill                            11 ok  3 nok        95m   25_KLPO   casdas5  dus41  fdd     tm500  sdfsd1010_1014             2m    2016-02-11 02:30:50  2016-02-11 08:53:59')

print (result)

But my pattern still matches with a line that contains nok

Community
  • 1
  • 1
John Rambo
  • 906
  • 1
  • 17
  • 37

2 Answers2

2

You can use this regex:

\d+\s+ok(?!\s+\d+\s+nok)

RegEx Demo

Important is to keep \s+ inside the negative lookahead to fail the match for 2nd case.

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • But can you explain, why it should be inside the negative lookahead? – John Rambo Feb 12 '16 at 19:08
  • If it is not inside the lookahead then `\s+` is not greedy enough to match all the spaces after `ok` and lookahead will fail since there is a space before next number. Besides `10 ok` doesn't even have a space after `ok` – anubhava Feb 12 '16 at 19:12
  • In PCRE you can make `\s` greedy by using `\d+\s+ok\s*+(?!\d+\s+nok)` but that regex won't work with Python – anubhava Feb 12 '16 at 19:15
1
s='10 ok  2 nok'
      # ^---- two spaces here
re.search(r'[0-9]+\s+ok\s+(?![0-9]+\s+nok)', s)

will succeed. Let's see what happens:

[0-9]+\s+ok\s+ matches '10 ok ' (with the two spaces), but after (?![0-9]+\s+nok) fails.

At this point, the regex engine uses the backtracking mechanism and \s+ gives back a character (the last space), then [0-9]+\s+ok\s+ matches '10 ok ' (only one space) and (?![0-9]+\s+nok) succeeds with ' 2 nok'

To avoid the backtracking, you can emulate an atomic group (?>...) (that forbids backtracking once closed) with (?=(...))\1 (a lookaround is naturally atomic):

(?=([0-9]+\s+ok\s+))\1(?![0-9]+\s+nok)
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125