4

Hello I'm trying to split a string without removing the delimiter and it can have multiple delimiters.

The delimiters can be 'D', 'M' or 'Y' For example:

>>>string = '1D5Y4D2M'
>>>re.split(someregex, string) #should ideally return
['1D', '5Y', '4D', '2M']

To keep the delimiter I use Python split() without removing the delimiter

>>> re.split('([^D]+D)', '1D5Y4D2M')
['', '1D', '', '5Y4D', '2M']

For multiple delimiters I use In Python, how do I split a string and keep the separators?

>>> re.split('(D|M|Y)', '1D5Y4D2M')
['1', 'D', '5', 'Y', '4', 'D', '2', 'M', '']

Combining both doesn't quite make it.

>>> re.split('([^D]+D|[^M]+M|[^Y]+Y)', string)
['', '1D', '', '5Y4D', '', '2M', '']

Any ideas?

Hasan Haghniya
  • 2,347
  • 4
  • 19
  • 29

5 Answers5

4

I'd use findall() in your case. How about:

re.findall(r'\d+[DYM]', string

Which will result in:

['1D', '5Y', '4D', '2M']
JvdV
  • 70,606
  • 8
  • 39
  • 70
2
(?<=(?:D|Y|M))

You need 0 width assertion split.Can be done using regex module python.

See demo.

https://regex101.com/r/aKV13g/1

vks
  • 67,027
  • 10
  • 91
  • 124
1

You can split at the locations right after D, Y or M but not at the end of the string with

re.split(r'(?<=[DYM])(?!$)', text)

See the regex demo. Details:

  • (?<=[DYM]) - a positive lookbehind that matches a location that is immediately preceded with D or Y or M
  • (?!$) - a negative lookahead that fails the match if the current position is the string end position.

Note

In the current scenario, (?<=[DYM]) can be used instead of a more verbose (?<=D|Y|M) since all alternatives are single characters. If you have multichar delimiters, you would have to use a non-capturing group, (?:...), with lookbehind alternatives inside it. For example, to separate right after Y, DX and MZB you would use (?:(?<=Y)|(?<=DX)|(?<=MZB)). See Python Regex Engine - "look-behind requires fixed-width pattern" Error

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

I think it will work fine without regex or split time complexity O(n)

string = '1D5Y4D2M'
temp=''
res = []
for x in string:
    if x=='D':
        temp+='D'
        res.append(temp)
        temp=''
    elif x=='M':
        temp+='M'
        res.append(temp)
        temp=''
    elif x=='Y':
        temp+='Y'
        res.append(temp)
        temp=''
    else:
        temp+=x
print(res)
0

using translate

string = '1D5Y4D2M'

delimiters = ['D', 'Y', 'M']
result = string.translate({ord(c): f'{c}*' for c in delimiters}).strip('.*').split('*')
print(result)

>>> ['1D', '5Y', '4D', '2M']
Ramesh
  • 635
  • 2
  • 15