r"\."+".+"+"apple"+".+"+"\."
This line is a bit odd; why concatenate so many separate strings? You could just use r'..+apple.+.'.
Anyway, the problem with your regular expression is its greedy-ness. By default a x+
will match x
as often as it possibly can. So your .+
will match as many characters (any characters) as possible; including dots and apple
s.
What you want to use instead is a non-greedy expression; you can usually do this by adding a ?
at the end: .+?
.
This will make you get the following result:
['.I like to eat apple. Me too.']
As you can see you no longer get both the apple-sentences but still the Me too.
. That is because you still match the .
after the apple
, making it impossible to not capture the following sentence as well.
A working regular expression would be this: r'\.[^.]*?apple[^.]*?\.'
Here you don’t look at any characters, but only those characters which are not dots themselves. We also allow not to match any characters at all (because after the apple
in the first sentence there are no non-dot characters). Using that expression results in this:
['.I like to eat apple.', ". Let's go buy some apples."]