The 'side effect' you are seeing is that re.split()
will only split on matches that are longer than 0 characters.
The \s*|:
pattern matches either on zero or more spaces, or on :
, whichever comes first. But zero spaces matches everywhere. In those locations where more than zero spaces matched, the split is made.
Because the \s*
pattern matches every time a character is considered for splitting, the next option :
is never considered.
Splitting on non-empty matches is called out explicitly in the re.split()
documentation:
Note that split will never split a string on an empty pattern match.
If you reverse the pattern, :
is considered, as it is the first choice:
>>> re.split(':|\s*', 'find a str:s2')
['find', 'a', 'str', 's2']