3

I'm trying to split a string by the positions given from a list, and add the parts to a new list. I start with:

seq = 'ATCGATCGATCG'
seq_new = []
seq_cut = [2, 8 , 10]

I would like to get:

seq_new = ['AT', 'CGATCG', 'AT', 'CG'] 

The list with the positions is variable in size and values. How can I process my data like this?

davidism
  • 121,510
  • 29
  • 395
  • 339

2 Answers2

7

Use zip to create indexes for slicing:

seq_new = [seq[start:end] for start, end in zip([None] + seq_cut, seq_cut + [None])]

This zips together [None, 2, 8 , 10] and [2, 8, 10, None] to create the indexes [(None, 2), (2, 8), (8, 10), (10, None)]. None as first index defaults to zero, None as the second index defaults to the size of the sequence being sliced.

Community
  • 1
  • 1
Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
  • 1
    Darn it - was just going to copy/paste from my editor almost exactly the same :p – Jon Clements Mar 13 '15 at 17:20
  • 1
    For added symmetry you can use `None` on the first one (like I just did! :P) – DSM Mar 13 '15 at 17:20
  • This can be made even more elegant by using a modified version of the "pairwise" recipe (left as exercise to the reader) described in [this post](http://stackoverflow.com/a/21303303/4621513)! The resulting expression would be `[seq[start:end] for start, end in pairwise(seq_cut)]` – mkrieger1 Mar 13 '15 at 17:52
  • @mkrieger1: An implementation of `pairwise` can be found in the recipe's section of the `itertools` docs. I don't think it's worth it here. The code is roughly the same: `seq_new = [seq[start:end] for start, end in pairwise([None] + seq_cut + [None])]` – Steven Rumbalski Mar 13 '15 at 18:01
  • Yes, I also think it's not worth it unless it's going to be used more than only a few times. And I meant to append `None` in the front and back inside a modified version of `pairwise`, to hide that little detail from the user. – mkrieger1 Mar 13 '15 at 21:18
4

Use slicing:

seq = "ATCGATCGATCG"
seq_new = []
seq_cut = [2, 8, 10]

last = 0
for idx in seq_cut:
    seq_new.append(seq[last:idx])
    last = idx
seq_new.append(seq[last:])
orlp
  • 112,504
  • 36
  • 218
  • 315
  • Python slicing syntax can be daunting to people unfamiliar with it, but it's super powerful (+1) https://docs.python.org/2.3/whatsnew/section-slices.html –  Mar 13 '15 at 17:14