How to extract substring between two keywords with exceptional cases?

Question

I want to extract substring between apple and each in a string. However, if each is followed by box, I want the result be an empty string.

In details, it means:

1)apple costs 5 dollars each -> costs 5 dollars

2)apple costs 5 dollars each box -> ``

I tried re.findall('(?<=apple)(.*?)(?=each)')).

It can tackle 1) but not 2).

How to solve the problem?

Thanks.

if you want it with regex. I will delete my answer :) – Alok Mishra Dec 27 '19 at 11:40 — Alok Mishra, Dec 27 '19 at 11:40

The fourth bird · Accepted Answer · 2019-12-27T11:15:57.847

2

You could add a negative lookahead, asserting what is on the right is not box. For a match only you can omit the capturing group.

(?<=apple).*?(?=each(?! box))

Regex demo

If you don't want to match the leading space, you could add that to the lookarounds

import re
s = "apple costs 5 dollars each"
print(re.findall(r'(?<=apple ).*?(?= each(?! box))', s))

Output

['costs 5 dollars']

You can also use a capturing group without the positive lookaheads and use the negative lookahead only. The value is in the first capturing group.

You could make use of word boundaries \b to prevent the word being part of a larger word.

\bapple\b(.*?)\beach\b(?! box)

Regex demo

edited Dec 27 '19 at 11:15

answered Dec 27 '19 at 11:05

The fourth bird

154,723
16
55
70

It works. Thank you, The fourth bird. BTW, why are you called the fourth bird? – Chan Dec 27 '19 at 11:15
@Chan An old colleague of mine chose `three litte birds` as a name for one of his projects. – The fourth bird Dec 27 '19 at 11:26

score 2 · Answer 2 · answered Dec 27 '19 at 11:27

try this without using regex:

myString = "apple costs 5 dollars each box"

myList = myString.split(" ")

storeString = []

for x in myList:

    if x == "apple":
        continue

    elif x == "each":
        break

    else:

        storeString.append(x)

# using list comprehension 
listToStr = ' '.join(map(str, storeString))

print(listToStr)

Output:

How to extract substring between two keywords with exceptional cases?

2 Answers2