Split a python list into other "sublists" i.e smaller lists

Question

I have a python list which runs into 1000's. Something like:

data=["I","am","a","python","programmer".....]

where, len(data)= say 1003

I would now like to create a subset of this list (data) by splitting the orginal list into chunks of 100. So, at the end, Id like to have something like:

data_chunk1=[.....] #first 100 items of list data
data_chunk2=[.....] #second 100 items of list data
.
.
.
data_chunk11=[.....] # remainder of the entries,& its len <=100, len(data_chunk_11)=3

Is there a pythonic way to achieve this task? Obviously I can use data[0:100] and so on, but I am assuming that is terribly non-pythonic and very inefficient.

Many thanks.

You could use [numpy's array_split function](https://docs.scipy.org/doc/numpy/reference/generated/numpy.array_split.html#numpy.array_split) e.g., `np.array_split(np.array(data), 20)` to split into 20 nearly equal size chunks. To make sure chunks are exactly equal in size use `np.split`. — Alex, Nov 20 '16 at 04:25

score 496 · Accepted Answer · edited May 28 '19 at 22:42

496

I'd say

chunks = [data[x:x+100] for x in range(0, len(data), 100)]

If you are using python 2.x instead of 3.x, you can be more memory-efficient by using xrange(), changing the above code to:

chunks = [data[x:x+100] for x in xrange(0, len(data), 100)]

edited May 28 '19 at 22:42

waterproof

4,943
5
30
28

answered Mar 12 '12 at 16:51

DanRedux

9,119
6
23
41

9

I'd go with that too. You might be able to do it in a more 'pythonic way' with itertools, but it will be ugly as sin! – Mar 12 '12 at 17:04
15

If you have a list and want a list, there's no reason to bother with itertools. They only make sense if you want to split up a stream of data without ever creating the entire thing. – alexis Mar 12 '12 at 17:59
6

Using itertools would actually be the less pythonic way to do it, wouldn't it? – Pastafarian Mar 10 '15 at 21:08

score 46 · Answer 2 · answered Mar 12 '12 at 16:50

Actually I think using plain slices is the best solution in this case:

for i in range(0, len(data), 100):
    chunk = data[i:i + 100]
    ...

If you want to avoid copying the slices, you could use itertools.islice(), but it doesn't seem to be necessary here.

The itertools() documentation also contains the famous "grouper" pattern:

def grouper(n, iterable, fillvalue=None):
    "grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args)

You would need to modify it to treat the last chunk correctly, so I think the straight-forward solution using plain slices is preferable.

thanks for the reply. I did think of your first plain slice solution, but then thought its maybe too inefficient and too naive of me.. I am a bit surprised that there isnt a pythonic way (one liners) to achieve this task :( — JohnJ, Mar 12 '12 at 16:56

score 20 · Answer 3 · edited Sep 01 '15 at 08:55

20

chunks = [data[100*i:100*(i+1)] for i in range(len(data)/100 + 1)]

This is equivalent to the accepted answer. For example, shortening to batches of 10 for readability:

data = range(35)
print [data[x:x+10] for x in xrange(0, len(data), 10)]
print [data[10*i:10*(i+1)] for i in range(len(data)/10 + 1)]

Outputs:

[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34]]
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [20, 21, 22, 23, 24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34]]

edited Sep 01 '15 at 08:55

qris

7,900
3
44
47

answered Mar 12 '12 at 16:49

inspectorG4dget

110,290
27
149
241

3

That's not what's being asked. – Ismail Badawi Mar 12 '12 at 16:51
1

Actually it is equivalent, except for a bug where the last batch is missing, now fixed. – qris Sep 01 '15 at 08:54
2

What if your data isn't divisible by 10 though? – Aaron Conway May 10 '18 at 23:40
2

@AARon It won't matter. The last list will be smaller – inspectorG4dget May 22 '18 at 12:45
I wish that your data was `data = range(35, 70)` to separate it from the indexes. And I kinda wish you had put the increment of 10 into a variable to make it clear that it is a value that has to be the same multiple places. But I still really like your answer. – grofte Feb 16 '23 at 14:22

Split a python list into other "sublists" i.e smaller lists

3 Answers3

Linked

Related