1

I want to capture tokens in a text in the following pattern:

The First 2 characters are alphabets and necessary, ends with [A-Z] or [A-Z][0-9] this is optional anything can come in between.

example:

AA123123A1

AA123123A

AA123123123

i want to match and capture start with ([A-Z][A-Z]) in group 1, end with [A-Z] or [A-Z][0-9] in group 3 and everything else between then in group2

Example:

AA123123A1 => [AA,123123,A1]

AA123123A. => [AA,123123,A]

AA123123123 => [AA,123123123,'']

the following regex is working in python but not in postgres.

regex='^([A-Za-z]{2})((?:.+)(?=[A-Za-z][0-9]{0,1})|(?:.*))([A-Za-z][0-9]{0,1}){0,1}$'

In Postgressql

    select regexp_matches('AA2311121A1',
    '^([A-Za-z]{2})((?:.+)(?=[A-Za-z][0-9]{0,1})|(?:.*))(.*)$','x');

result:

{AA,2311121A1,""}

I am trying to explore why positive lookahead behavior is not the same as python, and how to take make positive lookahead in Postgres work in this case.

1 Answers1

1

You can use

^([A-Za-z]{2})(.*?)([A-Za-z][0-9]?)?$

See the regex demo and a DB fiddle online:

enter image description here

Details:

  • ^ - start of string
  • ([A-Za-z]{2}) - two ASCII letters
  • (.*?) - Group 2: any zero or more chars as few as possible
  • ([A-Za-z][0-9]?)? - Group 3: an optional sequence of an ASCII letter and then an optional digit
  • $ - end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks @Wiktor This is working good, but the motive of asking this is more on understanding the behaviour of positive lookahead in Postgres and how to make it work like it is working in python. – YOGENDRA SONI Oct 12 '21 at 07:45
  • @YOGENDRASONI So, all you need to know is how lookaheads work? They work the same anywhere where they are supported. – Wiktor Stribiżew Oct 12 '21 at 07:47