Python extract sentences containing list of spesific words

Question

I am trying to extract sentences from large text set that contain list of words.

For example searching for "noodl", "vege" and "meat".

str1 = "My new noodles are great\n vegetables. Not \nthis noodle sentence though.\n Nor this vege sentences."
results = re.findall(regex, str1)

Should return "My new noodles are great\n vegetables." as only match.

From (Python extracting sentence containing 2 words) I was able to come up with following regex:

regex = re.compile(
            r"""
            ([^.]*?# Starting with anything but .
                 (# Capture group start
                    (noodl|vege|meat)# Countains these words
                    [^.]*#with anything but . in between
                 ){2,}# At least 2 times
                [^.]*\.# Followed by anything but '.' followed by '.'
                )
                        """,
            re.MULTILINE | re.IGNORECASE | re.VERBOSE)

But this results in

for x in results:
    print(x)
#My new noodles are great\n vegetables.
#vegetables
#vege

Which is unexpected. How should my regex be changed to match only the whole sentences? Found sentences are further processed. The natural language processed is not English but the current results are the same as with demo sentences.

Convert all capturing groups into non-capturing, see [demo](https://ideone.com/ODxAEE). — Wiktor Stribiżew, Jun 18 '19 at 11:21
Do you want to add your demo as an answer? It seems correct. — vahvero, Jun 18 '19 at 11:52

Python extract sentences containing list of spesific words

0 Answers0