63

What is the best way to split a list into parts based on an arbitrary number of indexes? E.g. given the code below

indexes = [5, 12, 17]
list = range(20)

return something like this

part1 = list[:5]
part2 = list[5:12]
part3 = list[12:17]
part4 = list[17:]

If there are no indexes it should return the entire list.

drjeep
  • 1,019
  • 2
  • 9
  • 15
  • I'm mildly interested in your answer-selection criteria ... simpler and faster are not "Pythonic"? – John Machin Aug 01 '09 at 01:46
  • Pretty much a combination of highest number of votes and my familiarity with the code. I'm no guru so can't speak for which is faster. Your solution does look interesting although I don't completely understand how it works. I'm open to revising my selection if a few people here care to confirm your solution is indeed better. After all, isn't wisdom of the crowds what Stack Overflow is all about :) – drjeep Aug 03 '09 at 08:23
  • You don't need to be a guru; use the timeit module. Understanding: (1) replace yield by print (2) read this: `http://stackoverflow.com/questions/231767/can-somebody-explain-me-the-python-yield-statement` – John Machin Aug 06 '09 at 03:58

9 Answers9

58

This is the simplest and most pythonic solution I can think of:

def partition(alist, indices):
    return [alist[i:j] for i, j in zip([0]+indices, indices+[None])]

if the inputs are very large, then the iterators solution should be more convenient:

from itertools import izip, chain
def partition(alist, indices):
    pairs = izip(chain([0], indices), chain(indices, [None]))
    return (alist[i:j] for i, j in pairs)

and of course, the very, very lazy guy solution (if you don't mind to get arrays instead of lists, but anyway you can always revert them to lists):

import numpy
partition = numpy.split
fortran
  • 74,053
  • 25
  • 135
  • 175
  • partition is a misleading name, as in many languages the partition function splits a list into two lists - of items passing and items failing the criteria (which is passed as a A -> Boolean function), e.g. partition(even, [1, 2, 3, 4, 5]) = ([2, 4], [1, 3, 5]) – Velizar Hristov Nov 18 '15 at 13:36
13

I would be interested in seeing a more Pythonic way of doing this also. But this is a crappy solution. You need to add a checking for an empty index list.

Something along the lines of:

indexes = [5, 12, 17]
list = range(20)

output = []
prev = 0

for index in indexes:
    output.append(list[prev:index])
    prev = index

output.append(list[indexes[-1]:])

print output

produces

[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16], [17, 18, 19]]
gonidelis
  • 885
  • 10
  • 32
kjfletch
  • 5,394
  • 3
  • 32
  • 38
  • 6
    Not very happy with 2 down votes for throwing ideas in the pot. Especially with no comments. I stated the rough nature of the solution. – kjfletch Jul 29 '09 at 08:13
  • 3
    This site shouldn't allow downvoting without an explanation. It's not useful at all. – Glenn Maynard Jul 29 '09 at 08:20
  • This answer is perfect, and what I or anyone would write if faced with this problem in the real world. The question was asking specifically for a "Pythonic" way, which I think some of the other answers adress better than this one. – anthony Jul 29 '09 at 08:28
  • 3
    That's why I hate the buzzword "Pythonic". It's as if everything written in Python should be written in some special Python-specific way, preferably forcibly crushed onto a single line for some reason. – Glenn Maynard Jul 29 '09 at 09:25
  • 2
    In my opinion, "pythonic" just means good, idiomatic style. It does not mean hypercondensed one-line solutions that show off every python feature. This looks perfectly pythonic to me. It uses slices appropriately, uses range when range is more appropriate than xrange, and iterates directly over the list rather than looping through the indices. Pythonic? Check. Comprehensible? Check. Accurate? Check. +1 – jcdyer Jul 29 '09 at 20:54
  • 1
    Oh, and in python 2, you can use prev after the for loop exits, so you can replace `output.append(list[indexes[-1]:])` with `output.append(list[prev:])`. – jcdyer Jul 29 '09 at 20:56
8

My solution is similar to Il-Bhima's.

>>> def parts(list_, indices):
...     indices = [0]+indices+[len(list_)]
...     return [list_[v:indices[k+1]] for k, v in enumerate(indices[:-1])]

Alternative approach

If you're willing to slightly change the way you input indices, from absolute indices to relative (that is, from [5, 12, 17] to [5, 7, 5], the below will also give you the desired output, while it doesn't create intermediary lists.

>>> from itertools import islice
>>> def parts(list_, indices):
...     i = iter(list_)
...     return [list(islice(i, n)) for n in chain(indices, [None])]
André Eriksson
  • 4,296
  • 2
  • 19
  • 16
  • +1 for shortness and for reusing values (`enumerate` iterates the existing array, as opposed to `zip`) – Blixt Jul 29 '09 at 07:43
  • As Blixt pointed out I think you meant indices instead of indicies. Then you have the minor issue when passing indices like [0,5,12,17], in which case your result will contain the empty list list_[0:0] – Il-Bhima Jul 29 '09 at 07:44
  • 1
    @Il-Bhima: Which could be argued is correct, since the first part is then of length 0, which is consistent with the OP's example. – André Eriksson Jul 29 '09 at 07:47
  • 1
    @Il-Bhima, I think the intended behavior of passing in "split at index 0" would be to get an empty array as the first split value. Personally, I hate "magic" behavior that changes based on the parameters. – Blixt Jul 29 '09 at 07:48
  • I updated this version to not create new lists (other than the slices necessary) while improving simplicity. – André Eriksson Jul 29 '09 at 08:30
  • This got downvoted why? Please leave comments if you think this is wrong or otherwise inappropriate, thanks :). I tested it pretty thoroughly so I think it covers all the edge cases as well. – André Eriksson Jul 29 '09 at 09:12
  • This is unintuitive code, and it doesn't work (compare the output to the other answers). Also, don't replace your answer with a completely different one after people have already voted and commented on the original. – Glenn Maynard Jul 29 '09 at 09:15
  • You might want to give people a chance to *write* a comment before you complain about not receiving one. – Glenn Maynard Jul 29 '09 at 09:15
  • @Glenn Maynard: Thank you; you're perfectly right. I'll keep that in mind from now on. – André Eriksson Jul 29 '09 at 09:48
  • @Cide: Why did you decide against this if you don't mind me asking? http://stackoverflow.com/revisions/0a518729-d726-47b5-b62b-c674e37849ef/view-source – drjeep Jul 29 '09 at 10:47
  • It's there now, but its functionality is slightly different from whta the OP asked for. It's detailed in the "Alternative approach" in the post now. – André Eriksson Jul 29 '09 at 20:18
6
>>> def burst_seq(seq, indices):
...    startpos = 0
...    for index in indices:
...       yield seq[startpos:index]
...       startpos = index
...    yield seq[startpos:]
...
>>> list(burst_seq(range(20), [5, 12, 17]))
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16], [17, 18, 19]]
>>> list(burst_seq(range(20), []))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]]
>>> list(burst_seq(range(0), [5, 12, 17]))
[[], [], [], []]
>>>

Maxima mea culpa: it uses a for statement, and it's not using whizzbang stuff like itertools, zip(), None as a sentinel, list comprehensions, ...

;-)

John Machin
  • 81,303
  • 11
  • 141
  • 189
  • 1
    The reason to use "whizzbang stuff" is not because it's trendy or to make people look smarter, it's because it is closer to a declarative specification of the solution. A mutable-state constructive approach is usually more error prone too... – fortran May 20 '13 at 10:03
3
indices = [5, 12, 17]
input = range(20)
output = []

reduce(lambda x, y: output.append(input[x:y]) or y, indices + [len(input)], 0)
print output
anthony
  • 40,424
  • 5
  • 55
  • 128
0

This is all that I could think of

def partition(list_, indexes):
    if indexes[0] != 0:
        indexes = [0] + indexes
    if indexes[-1] != len(list_):
        indexes = indexes + [len(list_)]
    return [ list_[a:b] for (a,b) in zip(indexes[:-1], indexes[1:])]
Il-Bhima
  • 10,744
  • 1
  • 47
  • 51
  • Clever solution, very concise =) I'll give my +1 to kjfletch's answer though, because it reuses existing values, while this method does a lot of list creation/modification. – Blixt Jul 29 '09 at 07:40
  • It'd be more consistent to drop the conditionals--if the first index is 0, the first item should be empty. Just use `indexes = [0] + indexes + [None]`. – Glenn Maynard Jul 29 '09 at 07:41
  • Also, it's probably better off with itertools.izip instead of zip, and itertools.islice instead of direct slicing. – Glenn Maynard Jul 29 '09 at 07:44
  • @Glenn Hmm, I was actually trying to avoid having those empty lists at the start and end. Not sure if that was want the original poster wanted – Il-Bhima Jul 29 '09 at 07:48
  • don't overwrite a built-in variable (list) – dalloliogm Jul 29 '09 at 08:09
0

Cide's makes three copies of the array: [0]+indices copies, ([0]+indices)+[] copies again, and indices[:-1] will copy a third time. Il-Bhima makes five copies. (I'm not counting the return value, of course.)

Those could be reduced (izip, islice), but here's a zero-copy version:

def iterate_pairs(lst, indexes):
    prev = 0
    for i in indexes:
        yield prev, i
        prev = i
    yield prev, len(lst)

def partition(lst, indexes):
    for first, last in iterate_pairs(lst, indexes):
        yield lst[first:last]

indexes = [5, 12, 17]
lst = range(20)

print [l for l in partition(lst, indexes)]

Of course, array copies are fairly cheap (native code) compared to interpreted Python, but this has another advantage: it's easy to reuse, to mutate the data directly:

for first, last in iterate_pairs(lst, indexes):
    for i in range(first, last):
        lst[i] = first
print lst
# [0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 5, 5, 12, 12, 12, 12, 12, 17, 17, 17]

(That's why I passed indexes to iterate_pairs. If you don't care about that, you can remove that parameter and just have the final line be "yield prev, None", which is all partition() needs.)

Glenn Maynard
  • 55,829
  • 10
  • 121
  • 131
0

Here's yet another answer.

def partition(l, indexes):
    result, indexes = [], indexes+[len(l)]
    reduce(lambda x, y: result.append(l[x:y]) or y, indexes, 0)
    return result

It supports negative indexes and such.

>>> partition([1,2,3,4,5], [1, -1])
[[1], [2, 3, 4], [5]]
>>> 
Steve Losh
  • 19,642
  • 2
  • 51
  • 44
-1

The plural of index is indices. Going for simplicity/readability.

indices = [5, 12, 17]
input = range(20)
output = []

for i in reversed(indices):
    output.append(input[i:])
    input[i:] = []
output.append(input)

while len(output):
    print output.pop()
anthony
  • 40,424
  • 5
  • 55
  • 128
  • 2
    "Indexes" and "indices" are both correct. "Indexes" is a pluralization of "index" and is more common in the US, while "indices" is derived from Latin and is more common in the UK. – Blixt Jul 29 '09 at 07:45
  • 1
    Did anyone want to discuss the viability of working right to left? or is this GrammarOverflow now? – anthony Jul 29 '09 at 07:52
  • Working right to left would imply you either insert at other locations than the end or reverse the list before returning it. Either case is less than ideal. – André Eriksson Jul 29 '09 at 07:58