0

I have this string;

string = "STARTcandyFINISH  STARTsugarFINISH STARTpoisonFINISH STARTBlobpoisonFINISH STARTpoisonBlobFINISH"

I would like to match and capture all substrings that appear in between START and FINISH but only if the word "poison" does NOT appear in that substring. How do I exclude this word and capture only the desired substrings?

re.findall(r'START(.*?)FINISH', string)

Desired captured groups:

candy
sugar
Sraw
  • 18,892
  • 11
  • 54
  • 87
etayluz
  • 15,920
  • 23
  • 106
  • 151
  • @Sraw I'm not sure if this is a duplicate because my question is about avoiding a word - not just a single character (please correct me if I'm wrong - thank you) – etayluz Jan 01 '20 at 08:57

1 Answers1

1

Using a tempered dot, we can try:

string = "STARTcandyFINISH  STARTsugarFINISH STARTpoisonFINISH STARTBlobpoisonFINISH STARTpoisonBlobFINISH"
matches = re.findall(r'START((?:(?!poison).)*?)FINISH', string)
print(matches)

This prints:

['candy', 'sugar']

For an explanation of how the regex pattern works, we can have a closer look at:

(?:(?!poison).)*?

This uses a tempered dot trick. It will match, one character at a time, so long as what follows is not poison.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360