0

I'm trying to decode a string that looks like this "2a3bc" into "aabbbc" in Python. So the first thing I need to do is to split it up into a list with groups that make sense. In other words: ['2a','3b','c'].

Essentially, match either (1) a number and a letter or (2) just a letter.

I've got this:

re.findall('\d+\S|\s', '2a3bc')

and it returns:

['2a', '3b']

So it's actually missing the c.

Perhaps my regex skills is lacking here, any help is appreciated.

adrianmcli
  • 1,956
  • 3
  • 21
  • 49
  • http://stackoverflow.com/questions/26006949/python-expanding-a-string-of-variables-with-integers and http://stackoverflow.com/questions/35003123/fairly-basic-string-expansion-in-python – TessellatingHeckler Mar 29 '17 at 01:56
  • @TessellatingHeckler Thanks, I never thought to use the word "expansion" or "expanding". But I did search for almost half an hour already. – adrianmcli Mar 29 '17 at 01:59

1 Answers1

5

Your current expression could work with a small bugfix: \S is non-whitespace, while \s is whitespace. You're looking for non-whitespace in both cases, so you shouldn't use \s anywhere:

>>> re.findall(r'\d+\S|\S', '2a3bc')
['2a', '3b', 'c']

However, this expression could be shorter: instead of using + for one or more digits, use * for zero or more, since the group might not be preceded by any digits, and you can then get rid of the alternation.

>>> re.findall(r'\d*\S', '2a3bc')
['2a', '3b', 'c']

Again, though, note that \S is simply non-whitespace - that includes letters, digits, and even punctuation. \D, non-digits, has a similar problem: it excludes digits, but includes punctuation. The shortest, clearest regex for this, then, would replace the \S with \w, which indicates alphanumeric characters:

>>> re.findall(r'\d*\w', '2a3bc')
['2a', '3b', 'c']

Since the other character class in the group is already digits, this particular \w will only match letters.

TigerhawkT3
  • 48,464
  • 6
  • 60
  • 97