I am trying to parse the "name" and "address" from a string. I have written the regex pattern which works perfectly fine (I verified in regex101.com) with the 'ungreedy/lazy' feature of regex but not with the greedy. Here is my snowflake query:
select
TRIM(REGEXP_SUBSTR(column1,'(^\\D*)((\\bP[OST]*[ .]*O[FFICE]*[ .]*B[OX]*[ .]*\\d+.*)|(\\d+.*))[,.\\s]+([a-zA-Z]{2})[,.\\s]+(\\d{5}|\\d{5}-\\d{4})$',1,1,'is',1)) as test
from values(TRIM('FIRST SECOND THIRD PO BOX 123 DUMMY XX 12345-6789'));
--please ignore the latter part of regex as I am fetching territory code and zip code also and they are working fine.
The above query is returning me "FIRST SECOND THIRD PO BOX" And, if I return the 2nd group it returns me "123 DUMMY"
What I want:
case 1 - when my string is 'FIRST SECOND THIRD PO BOX 123 DUMMY XX 12345-6789'
output of 1st group: "FIRST SECOND THIRD"
output of 2nd group: "PO BOX 123 DUMMY"
case 2 - WHEN my string is 'FIRST SECOND THIRD FOURTH FIFTH 123 DUMMY XX 12345-6789'
output of 1st group: "FIRST SECOND THIRD FOURTH FIFTH"
output of 2nd group: "123 DUMMY"
Please suggest workaround here in snowflake since it doesn't have lazy feature.
PS. If you want to verify in regex101, paste the below code and test string. You will see the result when you switch to Ungreedy. (^\D*)((\bP[OST][ .]O[FFICE][ .]B[OX][ .]\d+.)|(\d+.))[,.\s]+([a-zA-Z]{2})[,.\s]+(\d{5}|\d{5}-\d{4})$
Test String: FIRST SECOND THIRD PO BOX 123 DUMMY XX 12345-6789
Thanks