Regex to find only when theres or isn't a specified character in between

Question

I have the string

"<whateverblahblah> <separatethis>"

I want to use re.findall to return [<whateverblahblah>, <dontincludethis>] and not ["<whateverblahblah> <dontincludethis>"]. This doesn't happen when I do

re.findall("<.*>")

You need to make it less greedy: `<.*?>`. – jonrsharpe Dec 15 '14 at 10:25 — jonrsharpe, Dec 15 '14 at 10:25
You can also use `<[^>]+>` as your pattern. – hjpotter92 Dec 15 '14 at 10:28 — hjpotter92, Dec 15 '14 at 10:28

score 0 · Accepted Answer · answered Dec 15 '14 at 10:29

Explanation about the regex engine might help you to better understand what's going on here.

What happens is that < matches the first <, then the next token is ., it matches any character (except newlines), it'll be repeated by the * token.

* is greedy, the engine repeats it as many times as it can, so the regex continues to try to match the . with next characters, resulting with matching

whateverblahblah> <separatethis>

But now the > won't be match as we reached end of the string, so the engine will go back (backtracking) and match only up to the s, and finally the > will match the > in your regex.

As already stated in the comments, to solve this you need to:

<.*?>
   ↑

Which means laziness instead.

Regex to find only when theres or isn't a specified character in between

1 Answers1