I have the string
"<whateverblahblah> <separatethis>"
I want to use re.findall to return [<whateverblahblah>, <dontincludethis>]
and not ["<whateverblahblah> <dontincludethis>"]
. This doesn't happen when I do
re.findall("<.*>")
I have the string
"<whateverblahblah> <separatethis>"
I want to use re.findall to return [<whateverblahblah>, <dontincludethis>]
and not ["<whateverblahblah> <dontincludethis>"]
. This doesn't happen when I do
re.findall("<.*>")
Explanation about the regex engine might help you to better understand what's going on here.
What happens is that <
matches the first <
, then the next token is .
, it matches any character (except newlines), it'll be repeated by the *
token.
*
is greedy, the engine repeats it as many times as it can, so the regex continues to try to match the .
with next characters, resulting with matching
whateverblahblah> <separatethis>
But now the >
won't be match as we reached end of the string, so the engine will go back (backtracking) and match only up to the s
, and finally the >
will match the >
in your regex.
As already stated in the comments, to solve this you need to:
<.*?>
↑
Which means laziness instead.