splitting a python list in two without needing additional memory

Question

I would like to split an int list l in two small lists l1, l2 (I know the split point n).

I am already able to perform the splitting by copying l2 elements in another list and then removing them from l, but this requires to have space for at least n + n/2 elements in memory and this is not affordable since l is big.

Does someone has a solution?

Possible duplicate of [Split list into smaller lists](http://stackoverflow.com/questions/752308/split-list-into-smaller-lists) — asongtoruin, Feb 08 '17 at 11:22
Why splitting anyway? You could construct some sort of "view" that treats a part of the list as if it was a full list. — Willem Van Onsem, Feb 08 '17 at 11:22
@asongtoruin: This question is more specific about how to do this without having a huge memory burden. — Willem Van Onsem, Feb 08 '17 at 11:23
How about `a = l[:n]` and `b = l[n:]`? Or simply just refer to `l[:n]` and `l[n:]` in your code. — barak manos, Feb 08 '17 at 11:26
@WillemVanOnsemusing for a view, [**`itertools.islice`**](https://docs.python.org/2/library/itertools.html#itertools.islice) would seem a good option — Peter Wood, Feb 08 '17 at 11:27
how big are these lists that you need to care about this? if they truly are that big, you should be using an array or pandas or numpy or something. — acushner, Feb 08 '17 at 15:50

score 1 · Answer 1 · answered Feb 08 '17 at 11:54

You can slice the list with itertools.islice at n. The slice objects are lazy and are only loaded into memory when you iterate on them:

from itertools import islice

def split_list(lst, n):
    return islice(lst, n), islice(lst, n, None)

A, B = split_list(range(10), 5)

print(list(A))
# [0, 1, 2, 3, 4]
print(list(B))
# [5, 6, 7, 8, 9]

tobias_k · Accepted Answer · 2017-02-09T11:40:03.560

If you do not want to spend additional memory on the smaller lists, you have two possibilities:

Either you can destroy/reduce the original list as you create the smaller lists. You could use collections.deque, providing O(1) removal and insertion at both ends:

>>> from collections import deque
>>> deq = deque(range(20))
>>> front = deque(deq.popleft() for _ in range(10))
>>> front
deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> deq  # original list reduced, can be used as 2nd list
deque([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

Or you can create two views on parts of the smaller lists, meaning that the original list would be altered if the smaller lists are modified, and vice versa. For instance, use numpy.array for your numbers and create slices (being views) on that array:

>>> import numpy as np
>>> arr = np.array(range(20))
>>> front = arr[:10]
>>> back = arr[10:]
>>> front[3] = 100
>>> arr  # original list modified
array([  0,   1,   2, 100,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19])

If you have to use plain Python list, you could also use list.pop. However, as explained in the documentation for deque, you should not use pop(0), as this will have to re-organize the entire list each time you pop an element, giving you O(n²) for extracting half of the list. Instead, use pop() to pop from the end of the list. To restore the original order, you could first pop into a temporary list, and then pop from that list, reversing it twice.

>>> lst = list(range(10))
>>> tmp = [lst.pop() for _ in range(5)]
>>> front, back = lst, [tmp.pop() for _ in range(len(tmp))]
>>> front, back
([0, 1, 2, 3, 4], [5, 6, 7, 8, 9])

score 0 · Answer 3 · answered Feb 08 '17 at 11:34

0

How about trying this

class mylist(list):
    def split_by_position_n(self, n):
        return self[:int(n)], self[int(n):]

then, l = mylist(range(1,10))

l.split_by_position_n(4)
>>>([1, 2, 3, 4], [5, 6, 7, 8, 9])

answered Feb 08 '17 at 11:34

Vatsal Parekh

248
4
12

score 0 · Answer 4 · answered Feb 08 '17 at 15:25

0

I came with a simple solution. What about

l1 = [l.pop(0) for i in range(n)]

It seems to work and it should not require additional memory.

answered Feb 08 '17 at 15:25

aretor

2,379
2
22
38

This is similar to my first solution, but note that on regular lists, `pop(0)` is _very_ wasteful. You do not need additional memory, but with each `pop`, all the elements have to be repositioned. Better use a `deque`. In fact, the [documentation on `deque`](https://docs.python.org/3/library/collections.html#collections.deque) even mentions O(n) _memory_ cost for `pop(0)` on lists, which is exactly what you are trying to avoid. – tobias_k Feb 08 '17 at 20:00
(Misread the documentation, it's O(n) cost for moving memory, not O(n) memory cost, but still quadratic complexity for popping half the list, which is quite bad if the list is long.) – tobias_k Feb 08 '17 at 20:09
@tobias_k what about `pop()` only? Does it require O(n) cost for moving elements? I don't really need to keep the elements ordered. – aretor Feb 08 '17 at 20:57
`pop()` should not be as bad; it will probably still do some under-the-hood data reorganization from time to time as the list shrinks, and the elements in `li` will be reversed, but otherwise... to be sure you'll just have to try with your big list. Still, what's wrong with using a `deque`? – tobias_k Feb 08 '17 at 21:27
well `deque` is not `list` or a subclass of `list` I would be bound to use `deque` class without all the features of the `list` class, or do something like `deq = deque(l)` and then `l = list(deq)`. Would it be helpful to put my option as the 3rd choice in your answer, along with these considerations? – aretor Feb 09 '17 at 13:34

Ika8 · Answer 5 · 2017-02-09T08:44:28.703

Helo AreTor,

This is my solution:

list = [1,2,3,4,5,6,7,8,9]
l1 = list[0:n]
l2 = list[n:]

list is the list of elements.
n is the number of elements you want in the first list (the split point index).

list[0:n] will return the first n items of you list.

list[n:] will return the rest of the list.

Hope it helps you

EDIT: If you worry about memory problems, you can delete the list once you split it.. Even you can use list[0:n] / list[n:] in your code and you dont use more memory..

splitting a python list in two without needing additional memory

5 Answers5