109

I have the following problem.

Given a list of integers L, I need to generate all of the sublists L[k:] for k in [0, len(L) - 1], without generating copies.

How do I accomplish this in Python? With a buffer object somehow?

nbro
  • 15,395
  • 32
  • 113
  • 196
Chris
  • 3,109
  • 7
  • 29
  • 39

5 Answers5

178

The short answer

Slicing lists does not generate copies of the objects in the list; it just copies the references to them. That is the answer to the question as asked.

The long answer

Testing on mutable and immutable values

First, let's test the basic claim. We can show that even in the case of immutable objects like integers, only the reference is copied. Here are three different integer objects, each with the same value:

>>> a = [1000 + 1, 1000 + 1, 1000 + 1]

They have the same value, but you can see they are three distinct objects because they have different ids:

>>> map(id, a)
[140502922988976, 140502922988952, 140502922988928]

When you slice them, the references remain the same. No new objects have been created:

>>> b = a[1:3]
>>> map(id, b)
[140502922988952, 140502922988928]

Using different objects with the same value shows that the copy process doesn't bother with interning -- it just directly copies the references.

Testing with mutable values gives the same result:

>>> a = [{0: 'zero', 1: 'one'}, ['foo', 'bar']]
>>> map(id, a)
[4380777000, 4380712040]
>>> map(id, a[1:]
... )
[4380712040]

Examining remaining memory overhead

Of course the references themselves are copied. Each one costs 8 bytes on a 64-bit machine. And each list has its own memory overhead of 72 bytes:

>>> for i in range(len(a)):
...     x = a[:i]
...     print('len: {}'.format(len(x)))
...     print('size: {}'.format(sys.getsizeof(x)))
... 
len: 0
size: 72
len: 1
size: 80
len: 2
size: 88

As Joe Pinsonault reminds us, that overhead adds up. And integer objects themselves are not very large -- they are three times larger than references. So this saves you some memory in an absolute sense, but asymptotically, it might be nice to be able to have multiple lists that are "views" into the same memory.

Saving memory by using views

Unfortunately, Python provides no easy way to produce objects that are "views" into lists. Or perhaps I should say "fortunately"! It means you don't have to worry about where a slice comes from; changes to the original won't affect the slice. Overall, that makes reasoning about a program's behavior much easier.

If you really want to save memory by working with views, consider using numpy arrays. When you slice a numpy array, the memory is shared between the slice and the original:

>>> a = numpy.arange(3)
>>> a
array([0, 1, 2])
>>> b = a[1:3]
>>> b
array([1, 2])

What happens when we modify a and look again at b?

>>> a[2] = 1001
>>> b
array([   1, 1001])

But this means you have to be sure that when you modify one object, you aren't inadvertently modifying another. That's the trade-off when you use numpy: less work for the computer, and more work for the programmer!

Community
  • 1
  • 1
senderle
  • 145,869
  • 36
  • 209
  • 233
  • 3
    In an immutable object (such as a tuple), the references are immutable but the items they refer to can be mutable. So a tuple of 3 lists cannot be changed, it will always refer to the same 3 lists, but the contents of each list can change just like in any list. – trichoplax is on Codidact now Mar 18 '14 at 22:56
  • 8
    While the answer is correct, the example does not actually demonstrate it because small integers are interned; try doing `id(2)` or even `id(1+1)`. A better example would be to use `a = [[], [], []]`. – Exp HP Jan 07 '16 at 19:00
  • 1
    Or actually, upon further reading, the question actually *specifies* that the list is made of integers, so I find it quite curious that the author is even worried about item copies to begin with! (I would sooner think the OP did not fully understand your request for clarification, and actually wanted to obtain "views" into the original list) – Exp HP Jan 07 '16 at 19:14
  • 2
    This answer is correct, but I think it's worth pointing out that copying arrays of pointers can still be expensive if you have very large arrays – Joe Pinsonault Dec 10 '16 at 08:42
  • @ExpHP, that might be true. I was trying not to be long-winded, but I guess that's impossible for a question like this! Edited. – senderle Dec 11 '16 at 20:38
  • Reading this in mid-2019, @senderle looks like large integers are also interned. Moreover, in Python 3, looks like we need `list(map(id,a))` or `[id(x) for x in a]`. But it's great to see the concepts of your answer still hold today. – flow2k Jul 09 '19 at 01:55
  • I did an experiment that looked at the id of the list container itself. Although the id's of the list items themselves did not change, the resulting list after a slice operation is a new list container that has its own id. Deleting elements from the new container does not affect the original list container nor does modifying an immutable object in the new container affect the same object in the original list container. – John Moore Nov 10 '22 at 10:45
30

Depending on what you're doing, you might be able to use islice.

Since it operates via iteration, it won't make new lists, but instead will simply create iterators that yield elements from the original list as requested for their ranges.

Cristian Ciupitu
  • 20,270
  • 7
  • 50
  • 76
Amber
  • 507,862
  • 82
  • 626
  • 550
  • 11
    The bad thing here is thata islice doesn't take advantage of an object implementing the __getitem__ method, treating everything as an iterator, so it will iterate always from the first element of the list until it reach the first position of the list to begin yielding the values in the range. – jgomo3 Mar 09 '17 at 03:36
8

A simple alternative to islice that doesn't iterate through list items that it doesn't need to:

def listslice(xs, *args):
    for i in range(len(xs))[slice(*args)]:
        yield xs[i]

Usage:

>>> xs = [0, 2, 4, 6, 8, 10]

>>> for x in listslice(xs, 2, 4):
...     print(x)
4
6
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
7

Generally, list slicing is the best option.

Here is a quick performance comparison:

from timeit import timeit
from itertools import islice

for size in (10**4, 10**5, 10**6):
    L = list(range(size))
    S = size // 2
    def sum_slice(): return sum(L[S:])
    def sum_islice(): return sum(islice(L, S, None))
    def sum_for(): return sum(L[i] for i in range(S, len(L)))

    assert sum_slice() == sum_islice()
    assert sum_slice() == sum_for()

    for method in (sum_slice, sum_islice, sum_for):
        print(f'Size={size}, method={method.__name__}, time={timeit(method, number=1000)} ms')

Results:

Size=10000,   method=sum_slice,  time=0.0298 ms
Size=10000,   method=sum_islice, time=0.0449 ms
Size=10000,   method=sum_for,    time=0.2500 ms
Size=100000,  method=sum_slice,  time=0.3262 ms
Size=100000,  method=sum_islice, time=0.4492 ms
Size=100000,  method=sum_for,    time=2.4849 ms
Size=1000000, method=sum_slice,  time=5.4092 ms
Size=1000000, method=sum_islice, time=5.1139 ms
Size=1000000, method=sum_for,    time=26.198 ms
gatopeich
  • 3,287
  • 31
  • 26
4

I wrote a ListView class that avoids copying even the spine of the list:

https://gist.github.com/3noch/b5f3175cfe39aea71ca4d07469570047

This supports nested slicing so that you can continue slicing into the view to get narrower views. For example: ListView(list(range(10)))[4:][2:][1] == 7.

Note that this is not fully baked and deserves a good bit more error checking for when the underlying list is mutated along with a test suite.

Elliot Cameron
  • 5,235
  • 2
  • 27
  • 34