You are using .*
which is greedy. You want to add ?
to the end of that making it non greedy.
>>> re.findall(r'<i>(.*?)</i>', 'test <i>abc</i> <i>def</i>')
['abc', 'def']
From the re
documentation:
The *
, +
, and ?
qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.>
is matched against '<H1>title</H1>'
, it will match the entire string, and not just ''. Adding ?
after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .?
in the previous expression will match only ''.