0

I have a two strings

/some/path/to/sequence2.1001.tif

and

/some/path/to/sequence_another_u1_v2.tif

I want to write a function so that both strings can be split up into a list by some regex and joined back together, without losing any characters.

so

def split_by_group(path, re_compile): 
    # ...
    return ['the', 'parts', 'here']

split_by_group('/some/path/to/sequence2.1001.tif', re.compile(r'(\.(\d+)\.')
# Result: ['/some/path/to/sequence2.', '1001', '.tif']

split_by_group('/some/path/to/sequence_another_u1_v2.tif', re.compile(r'_[uv](\d+)')
# Result: ['/some/path/to/sequence_another_u', '1', '_v', '2', '.tif']

It's less important that the regex be exactly what I wrote above (but ideally, I'd like the accepted answer to use both). My only criteria are that the split string must be combinable without losing any digits and that each of the groups split in the way that I showed above (where the split occurs right at the start/end of the capture group and not the full string.

I made something with finditer but it's horribly hacky and I'm looking for a cleaner way. Can anyone help me out?

ColinKennedy
  • 828
  • 7
  • 24
  • 1
    Possible duplicate of [In Python, how do I split a string and keep the separators?](https://stackoverflow.com/questions/2136556/in-python-how-do-i-split-a-string-and-keep-the-separators) – szabadkai May 31 '17 at 04:36
  • If these are paths you might consider `os.path` – pylang May 31 '17 at 04:38

1 Answers1

1

Changed your regex a little bit if you don't mind. Not sure if this works with your other cases.

def split_by_group(path, re_compile):
    l = [s for s in re_compile.split(path) if s]
    l[0:2] = [''.join(l[0:2])]
    return l

split_by_group('/some/path/to/sequence2.1001.tif', re.compile('(\.)(\d+)'))
# Result: ['/some/path/to/sequence2.', '1001', '.tif']

split_by_group('/some/path/to/sequence_another_u1_v2.tif', re.compile('(_[uv])(\d+)'))
# Result: ['/some/path/to/sequence_another_u', '1', '_v', '2', '.tif']
Y. Luo
  • 5,622
  • 1
  • 18
  • 25
  • Works perfectly. Accepting your answer. Question though - Is there any way to get the same effect without capturing every piece of into into separate groups? In my case, my regex is flexible but I'd like to know in case in some other scenario – ColinKennedy Jun 01 '17 at 03:32
  • I can't be sure but I don't think there is an easy way to do that. It is documented for `re.split` that: "If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list." In other word, if you don't capture every pieces, it will be dropped. Then you can't use `re.split` but have to use `re.search` or `re.match` with a loop. Though that won't require capturing every pieces, that doesn't seem to be the "cleaner way" you asked for. – Y. Luo Jun 01 '17 at 03:58