For instance, I'd like to convert "91234 5g5567\t7₇89^"
into ["9","1","2","3","4 5g55","67\t7₇8","9^"]
. Of course this can be done in a for loop without using any regular expressions, but I want to know if this can be done via a singular regular expression. At present I find two ways to do so:
>>> import re
>>> def way0(char: str):
... delimiter = ""
... while True:
... delimiter += " "
... if delimiter not in char:
... substitution = re.compile("([0-9])(?!\\1)([0-9])")
... replacement = "\\1"+delimiter+"\\2"
... cin = [char]
... while True:
... cout = []
... for term in cin: cout.extend(substitution.sub(replacement,term).split(delimiter))
... if cout == cin:
... return cin
... else:
... cin = cout
...
>>> way0("91234 5g5567\t7₇89^")
['9', '1', '2', '3', '4 5g55', '67\t7₇8', '9^']
>>> import functools
>>> way1 = lambda w: ["".join(list(y)) for x, y in itertools.groupby(re.split("(0+|1+|2+|3+|4+|5+|6+|7+|8+|9+)", w), lambda z: z != "") if x]
>>> way1("91234 5g5567\t7₇89^")
['9', '1', '2', '3', '4 5g55', '67\t7₇8', '9^']
However, neither way0
nor way1
is concise (and ideal). I have read the help page of re.split
; unfortunately, the following code does not return the desired output:
>>> re.split(r"(\d)(?!\1)(\d)","91234 5g5567\t7₇89^")
['', '9', '1', '', '2', '3', '4 5g5', '5', '6', '7\t7₇', '8', '9', '^']
Can re.split
solve this problem directly (that is, without extra conversions)? (Note that here I don't focus on the efficiency.)
There are some questions of this topic before (for example, Regular expression of two digit number where two digits are not same, Regex to match 2 digit but different numbers, and Regular expression to match sets of numbers that are not equal nor reversed), but they are about "RegMatch". In fact, my question is about "RegSplit" (rather than "RegMatch" or "RegReplace").