0

I have the string

"<whateverblahblah> <separatethis>"

I want to use re.findall to return [<whateverblahblah>, <dontincludethis>] and not ["<whateverblahblah> <dontincludethis>"]. This doesn't happen when I do

re.findall("<.*>")
Maroun
  • 94,125
  • 30
  • 188
  • 241
han Park
  • 25
  • 5

1 Answers1

0

Explanation about the regex engine might help you to better understand what's going on here.

What happens is that < matches the first <, then the next token is ., it matches any character (except newlines), it'll be repeated by the * token.

* is greedy, the engine repeats it as many times as it can, so the regex continues to try to match the . with next characters, resulting with matching

whateverblahblah> <separatethis>

But now the > won't be match as we reached end of the string, so the engine will go back (backtracking) and match only up to the s, and finally the > will match the > in your regex.

As already stated in the comments, to solve this you need to:

<.*?>
   ↑

Which means laziness instead.

Maroun
  • 94,125
  • 30
  • 188
  • 241