Your current expression could work with a small bugfix: \S
is non-whitespace, while \s
is whitespace. You're looking for non-whitespace in both cases, so you shouldn't use \s
anywhere:
>>> re.findall(r'\d+\S|\S', '2a3bc')
['2a', '3b', 'c']
However, this expression could be shorter: instead of using +
for one or more digits, use *
for zero or more, since the group might not be preceded by any digits, and you can then get rid of the alternation.
>>> re.findall(r'\d*\S', '2a3bc')
['2a', '3b', 'c']
Again, though, note that \S
is simply non-whitespace - that includes letters, digits, and even punctuation. \D
, non-digits, has a similar problem: it excludes digits, but includes punctuation. The shortest, clearest regex for this, then, would replace the \S
with \w
, which indicates alphanumeric characters:
>>> re.findall(r'\d*\w', '2a3bc')
['2a', '3b', 'c']
Since the other character class in the group is already digits, this particular \w
will only match letters.