16

I have a question about how to create a sublist (I hope this is the right term to use) from a given list without copying.

It seems that slicing can create sublists, but does it with copying. Here is an example.

In [1]: a = [1,2,3]

In [2]: id(a)
Out[2]: 4354651128

In [3]: b = a[0:2]

In [4]: b
Out[4]: [1, 2]

In [5]: id(b)
Out[5]: 4354621312

In [6]: id(a[0:2])
Out[6]: 4354620880

See here the id of b and a[0:2] are different, although their values are the same. To double check, change the value in a, the value in b does not change.

In [7]: a[1] = 4

In [8]: a
Out[8]: [1, 4, 3]

In [9]: b
Out[9]: [1, 2]

So to get back to my question, how can I create sublists but without copying? I mean, when value of a[1] is set to 4, b will be [1, 4].

I searched around and did not find much help (maybe I am not using the right keywords). Thank you!


Edits:

Thank you all for your comments and answers! Here is what I have learned.

  • There is no built-in way in Python to create a view of a list (or to create a sublist without copying).
  • The easiest way to do this is to use the numpy array.
  • Although numpy array has limitations on data type compared with list, it does serve my purpose (to implement quicksort with no extra memory)

Here is the same process with numpy array.

In [1]: import numpy as np

In [2]: a = np.arange(1,4)

In [3]: a
Out[3]: array([1, 2, 3])

In [4]: b = a[0:2]

In [5]: b
Out[5]: array([1, 2])

In [6]: id(b)
Out[6]: 4361253952

In [7]: id(a[0:2])
Out[7]: 4361254032

In [8]: a[1] = 4

In [9]: a
Out[9]: array([1, 4, 3])

In [10]: b
Out[10]: array([1, 4])
Ethan
  • 445
  • 4
  • 9
  • 1
    The problem with this kind of sharing is memory leaks: Say you represent a slice list[a:b] using a reference to list and the values a and b. Then even if the slice is very small, it prevents the list from being garbage collected, which might be very costly. But of course you can define a custom class for "symbolic" list slices with the above representation. – Niklas B. Feb 05 '15 at 22:07
  • why do you want to do this? – Padraic Cunningham Feb 05 '15 at 22:47
  • 1
    I think what you're describing is very close to the concept of views on `numpy` arrays. See [this SO post and answer](http://stackoverflow.com/questions/4370745/view-onto-a-numpy-array) for some discussion on the topic. Be warned though that `numpy` arrays are less flexible about the types of data they can contain, compared to typical Python lists, so they may not fit your use case depending on what data you want to include. – zehnpaard Feb 06 '15 at 01:30
  • @zehnpaard: I like your answer very much. "View", compared with "sublist", is a more accurate description of what I want. I have tried numpy array, and slicing with numpy array does not make copies. – Ethan Feb 06 '15 at 14:20
  • @PadraicCunningham: I learned quicksort and was trying to implement it, especially its no-extra-memory feature. After choosing pivot and partitioning, I need to sort the left side and right side respectively (and in memory). So when using the recursion, I want to pass it a view (or sublist) without making a copy. – Ethan Feb 06 '15 at 14:28
  • @zehnpaard: I think your answer is the best one, although I do not know how to accept a comment. If you can submit an answer, I would accept it. – Ethan Feb 06 '15 at 14:38
  • @Ethan - ah glad the suggestion was useful, I added the answer below. – zehnpaard Feb 06 '15 at 15:08

4 Answers4

9

numpy's array objects support this notion of creating interdependent sub-lists, by having slicing return views rather than copies of the data.

Altering the original numpy array will alter the views created from the array, and changes to any of the views will also be reflected in the original array. Especially for large data sets, views are a great way of cutting data in different ways, while saving on memory.

>>> import numpy as np
>>> array1 = np.array([1, 2, 3, 4])
>>> view1 = array1[1:]
>>> view1
array([2, 3, 4])
>>> view1[1] = 5
>>> view1
array([2, 5, 4])
>>> array1
array([1, 2, 5, 4]) # Notice that the change to view1 has been reflected in array1

For further reference, see the numpy documentation on views as well as this SO post.

Community
  • 1
  • 1
zehnpaard
  • 6,003
  • 2
  • 25
  • 40
2

There is no way to do this with built in Python data structures. However, I created a class that does what you need. I don't guarantee it to be bug-free, but it should get you started.

from itertools import islice

class SubLister(object):
    def __init__(self, base=[], start=0, end=None):
        self._base = base
        self._start = start
        self._end = end

    def __len__(self):
        if self._end is None:
            return len(self._base) - self._start
        return self._end - self._start

    def __getitem__(self, index):
        self._check_end_range(index)
        return self._base[index + self._start]

    def __setitem__(self, index, value):
        self._check_end_range(index, "list assignment index out of range")
        self._base[index + self._start] = value

    def __delitem__(self, index):
        self._check_end_range(index, "list assignment index out of range")
        del self._base[index + self._start]

    def __iter__(self):
        return islice(self._base, self._start, self._end)

    def __str__(self):
        return str(self._base[self._start:self._end])

    def __repr__(self):
        return repr(self._base[self._start:self._end])

    # ...etc...

    def get_sublist(self, start=0, end=None):
        return SubLister(base=self._base, start=start, end=end)

    def _check_end_range(self, index, msg="list index out of range"):
        if self._end is not None and index >= self._end - self._start:
            raise IndexError(msg)

Example:

>>> from sublister import SubLister
>>> base = SubLister([1, 2, 3, 4, 5])
>>> a = base.get_sublist(0, 2)
>>> b = base.get_sublist(1)

>>> base
[1, 2, 3, 4, 5]
>>> a
[1, 2]
>>> b
[2, 3, 4, 5]
>>> len(base)
5
>>> len(a)
2
>>> len(b)
4

>>> base[1] = 'ref'
>>> base
[1, 'ref', 3, 4, 5]
>>> a
[1, 'ref']
>>> b
['ref', 3, 4, 5]
Dunes
  • 37,291
  • 7
  • 81
  • 97
mVChr
  • 49,587
  • 11
  • 107
  • 104
  • It's a nice implementation, but a couple of the methods still copy the list, which is undesirable (such as len and iter). – Dunes Feb 06 '15 at 15:49
  • Thanks for the fixes in your edit @Dunes, turns out it was indeed buggy like I mentioned. I just wanted to give the guy a start that he could work with. – mVChr Feb 06 '15 at 17:35
  • I wouldn't say the changes I made were bug fixes. The code was functionally correct. It was more just to improve the efficiency of the class which was meant to minimise copying of the list. – Dunes Feb 07 '15 at 01:06
1

you can't if you slice a to get b.

All slice operations return a new list containing the requested elements. This means that the following slice returns a new (shallow) copy of the list [1]

[1] https://docs.python.org/2/tutorial/introduction.html

Kamyar Ghasemlou
  • 859
  • 2
  • 9
  • 24
1

There is no built-in way to do this. You could create your own list-like class that takes a reference to a list and reimplements all of the list accessor methods to operate on it.

tdelaney
  • 73,364
  • 6
  • 83
  • 116