I need help creating the best possible regular expression for this problem.
I have combinations / sets of Starting and End Delimeters and I need to get ALL the substring / any words between the starting delimeter upto the end delimeter.
Assume this table of Delimeters:
START | END
CAT | DOG
APPLE | ORANGE
LION | ZEBRA
PANDA | CAT
sample Input:
substring1 CAT substring2 substring3 DOG substring4 substring5 CAT substring6
APPLE substring7 substring 8 ORANGE ORANGE substring9 DOG substring10 PANDA
substring11 CAT substring12 DOG substring13 LION substring10 substring11 ZEBRA substring12
CAT substring13 substring14 APPLE substring15 substring 16 ORANGE
The output must be:
- CAT substring2 substring3 DOG
- APPLE substrin7 substring8 ORANGE
- PANDA substring 11 CAT
- LION substring10 substring 11 ZEBRA
- APPLE substring15 substring16 ORANGE
My regular expression:
CAT (.)*? DOG | APPLE (.)*? ORANGE | LION (.)*? ZEBRE | PANDA (.)*? CAT
I have problem dealing with string that has multiple occurence of other starting delimeter.
take for example:
CAT word1 word2 word3 word4 APPLE word5 word6 word7 DOG
I know that it will match with this CAT (.)*? DOG but this is wrong since the substring contains one of the starting delimeters.
I just need a regex that that will get all the words between a starting delimeter upto its matching end delimeter if ever the substring does not contain any occurence of other starting delimeters.
any suggestion? Thanks