44

I want to split a string by a list of indices, where the split segments begin with one index and end before the next one.

Example:

s = 'long string that I want to split up'
indices = [0,5,12,17]
parts = [s[index:] for index in indices]
for part in parts:
    print part

This will return:

long string that I want to split up
string that I want to split up
that I want to split up
I want to split up

I'm trying to get:

long
string
that
I want to split up

smci
  • 32,567
  • 20
  • 113
  • 146
Yarin
  • 173,523
  • 149
  • 402
  • 512

4 Answers4

64
s = 'long string that I want to split up'
indices = [0,5,12,17]
parts = [s[i:j] for i,j in zip(indices, indices[1:]+[None])]

returns

['long ', 'string ', 'that ', 'I want to split up']

which you can print using:

print '\n'.join(parts)

Another possibility (without copying indices) would be:

s = 'long string that I want to split up'
indices = [0,5,12,17]
indices.append(None)
parts = [s[indices[i]:indices[i+1]] for i in xrange(len(indices)-1)]
eumiro
  • 207,213
  • 34
  • 299
  • 261
  • 3
    Another way is, `[s[i:j] for i,j in izip_longest(indices,indices[1:])]` but I like your way better! – jamylak Jun 01 '12 at 13:51
  • This copies the indices list with `indices[1:]` and creates a new list with double size by the `zip` function -> Bad performance and memory consumption. – schlamar Jun 01 '12 at 13:58
  • 2
    @ms4py This is fine, performance is not an issue in this case, this is a very readable solution. If performance is an issue my suggestion can be used. – jamylak Jun 01 '12 at 14:01
  • 1
    eumiro- thank you, this works great. Can you explain how the +[None] part works? – Yarin Jun 01 '12 at 14:06
  • @ms4py - ok, there's an updated version withou copying of the list and without zip. Although your `itertools` version is probably more performant. – eumiro Jun 01 '12 at 14:06
  • @Yarin - `indices[1:] + [None]` copies the array without the first element and adds a `None` at the end. So for your `indices` it looks like `[5,12,17,None]`. I am using it to be able to access the last part of the string with `s[17:None]` (the same like `s[17:]`, just using two variables I have anyway). – eumiro Jun 01 '12 at 14:08
  • @Yarin `[1:None]` for example is the same as `[1:]` – jamylak Jun 01 '12 at 14:08
  • @ms4py What do you mean by that? – jamylak Jun 01 '12 at 14:11
  • Not sure it's your fortee but how would on do this in NodeJs? – lonewarrior556 Apr 23 '20 at 15:35
  • This had been a hectic for me since an hour and half. Thanks @eumiro – Siva Sankar Apr 17 '22 at 13:29
5

Here is a short solution with heavy usage of the itertools module. The tee function is used to iterate pairwise over the indices. See the Recipe section in the module for more help.

>>> from itertools import tee, izip_longest
>>> s = 'long string that I want to split up'
>>> indices = [0,5,12,17]
>>> start, end = tee(indices)
>>> next(end)
0
>>> [s[i:j] for i,j in izip_longest(start, end)]
['long ', 'string ', 'that ', 'I want to split up']

Edit: This is a version that does not copy the indices list, so it should be faster.

flywire
  • 1,155
  • 1
  • 14
  • 38
schlamar
  • 9,238
  • 3
  • 38
  • 76
  • Thanks for the alt approach- ill have to check out itertools sometime – Yarin Jun 01 '12 at 14:10
  • Neat approach, learned something new. Is there an easy way to get rid of the extra blank at the end of the first 3 strings inside the expression? I tried `s[i:j].strip()` but that didn't work at all (not sure why not) – Levon Jun 01 '12 at 14:11
  • If you are gonna use this you may as well use the pairwise function straight from the itertools docs. Also using `next(end)` is preferred to `end.next()` for python 3 compatibility. – jamylak Jun 01 '12 at 14:34
4

You can write a generator if you don't want to make any modifications to the list of indices:

>>> def split_by_idx(S, list_of_indices):
...     left, right = 0, list_of_indices[0]
...     yield S[left:right]
...     left = right
...     for right in list_of_indices[1:]:
...         yield S[left:right]
...         left = right
...     yield S[left:]
... 
>>> 
>>> 
>>> s = 'long string that I want to split up'
>>> indices = [5,12,17]
>>> [i for i in split_by_idx(s, indices)]
['long ', 'string ', 'that ', 'I want to split up']
Zhou Shao
  • 116
  • 1
  • 4
0

Another solution (a bit more readable):

parts=[]; i2=len(s)  #--> i1 and i2 are 'startIndex' and 'endIndex'

for i1 in reversed(indices): parts.append( s[i1:i2] );  i2=i1

parts.reverse()

This reverses the indices and therefore starts splitting from the last index position to the 'endIndex' i2 (which is updated in every loop).

Of course the elements are in the wrong order than. That's why I reversed the result array at the end.

I think for beginners this is a bit more readable than the accepted answer.