-2

i am extracting text from a pdf and i want to search for an expression like P50+P60, but in the text there are also terms like P50+P40+P30. How can i implement, that i just find a Structure like Pxx+Pxx (x=digit) and there is nothing found for Pxx+Pxx+Pxx.

I tried it like this

List = re.findall('(P\d\d+P\d\d[^\+P\d\d])', String)

but this shows also the P50+P40 from the term P50+P40+P30. I tried a lot but couldn't fix the problem.

  • 3
    As per the documentation, you could use [negative lookahead assertions](https://docs.python.org/3/howto/regex.html#lookahead-assertions). – 9769953 Feb 18 '21 at 22:19
  • 1
    This is not a duplicate of [Understanding negative lookahead](https://stackoverflow.com/questions/27691225/understanding-negative-lookahead), negative lookahead alone would not help, `P\d\d+` needed excluding, too. – Ryszard Czech Mar 06 '21 at 23:25

1 Answers1

1

Use

re.findall(r'(?<!P\d\d\+)P\d\d\+P\d\d(?!\+P\d\d)', String)

See proof

EXPLANATION

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    P                        'P'
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
    \+                       '+'
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  P                        'P'
--------------------------------------------------------------------------------
  \d                       digits (0-9)
--------------------------------------------------------------------------------
  \d                       digits (0-9)
--------------------------------------------------------------------------------
  \+                       '+'
--------------------------------------------------------------------------------
  P                        'P'
--------------------------------------------------------------------------------
  \d                       digits (0-9)
--------------------------------------------------------------------------------
  \d                       digits (0-9)
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \+                       '+'
--------------------------------------------------------------------------------
    P                        'P'
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
  )                        end of look-ahead
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37