How to implement the lazy slicing of a class instance?

Question

I am building a class which has underpinning it a list of ints, its __getitem__ method is then a reasonably computationally involved function outputting a list of potentially large size. Here is the code:

Class A(list):

    def __init__(self, n):
        self.li = [i for i in range(0, n)]
        super().__init__(self.li)
        self._ind = 0

    def __getitem__(self, item):

        assert (type(item) is int and item >= 0) or type(item) is slice

        if type(item) is slice:
            start, stop, step = item.start, item.stop, item.step
            if stop is None:
                stop = len(self)
            if step is not None:
                for i in range(start, stop, step):
                    yield self[i]
            else:
                for i in range(start, stop):
                    yield self[i]

        if item == 0 or item == 1:
            return []

        out = []
        while item != 1:
            if super().__getitem__(item):
                p = super().__getitem__(item)
            else:
                out.append(item)
                return out

            if p not in out:
                out.append(p)
            item //= p
        return out

My code is currently not working as it always returns a generator, and this messes up iterating over an instance of the class. However upon slicing, I do not wish it to do all the computations and store it, as this would make it consume a lot more memory. An example of what I wish to do:

from math import prod

test = A(10**7)
out = 0
for i in test[10**6:10**7]:
    out += prod(i)

How can I make slices work efficiently?

Why do you have that specific check for `item == 0 or item == 1` in your `__getitem__()`? — martineau, Jan 24 '22 at 02:24
In the original version of the code, self.li is not the range(0,n) but is an array of the smallest prime factor of k. So self.li[15] = 3. In the case of 0 and 1 this is not existent and is instead set to zero. So in these cases it would return an error. The short answer is it's a legacy of the purpose I was originally writing it for and I did not remove it when I was condensing my code. — Anon 100100, Jan 24 '22 at 16:32

score 1 · Answer 1 · answered Jan 23 '22 at 23:36

I'm not totally sure what you're asking. Does this work for your use case? Items will be generated lazily and not stored (unless the caller stores them).

class LazyCollection:

    def __getitem__(self, key):
        if isinstance(key, int):
            return self.get_single_item(key)

        elif isinstance(key, slice):
            def my_generator():
                for index in slice_to_range(key):
                    yield self.get_single_item(index)
            return my_generator()


    def get_single_item(self, index):
        # (Example! Your logic here.)
        return index * 10


def slice_to_range(my_slice):
    '''
    More options for implementing this:

    https://stackoverflow.com/questions/13855288/turn-slice-into-range
    '''
    start = my_slice.start if my_slice.start is not None else 0
    stop = my_slice.stop
    step = my_slice.step if my_slice.step is not None else 1
    return range(start, stop, step)


coll = LazyCollection()

# Get a single item
print(coll[9999])

# Get a slice
for x in coll[2:10]:
    print(x)

Output:

Thank you, this solves my prolem perfectly! – Anon 100100 Jan 24 '22 at 00:22 — Anon 100100, Jan 24 '22 at 00:22

Karl Knechtel · Accepted Answer · 2022-01-24T06:59:49.810

Instead of returning a generator, return a view. This is an object that conceptually represents the sequence of elements in question. We can do this by storing a reference to the original A instance, plus a range that encodes the instances. When we index into the view, we can ask the range which original index (or indices) are involved.

Supposing we set up the general structure:

class A(list):
    def __init__(self, n):
        super().__init__()
        self[:] = [i for i in range(0, n)]

    def _at(self, idx):
        # custom logic here to return the correct value for self[idx]

    def __getitem__(self, idx):
        if isinstance(idx, int):
            return self._at(idx)
        elif isinstance(idx, slice):
            # The `indices` method of a slice object converts to
            # start, stop, step values which we can use to construct a range.
            return A_view(self, range(*idx.indices(len(self))))
        else:
            raise TypeError # this is not an appropriate use for `assert`

Then our view could look like:

class A_view:
    def __init__(self, original, indices):
        self._original, self._indices = original, indices


    def __getitem__(self, idx):
        if isinstance(idx, int):
            return self._original[self._indices[idx]]
        elif isinstance(idx, slice):
            return A_view(self._original, self._indices[idx])
        else:
            raise TypeError

    def __len__(self):
        return len(self._indices)

The idea is that if we receive an integer index, we translate it through the range object into an index for the original A, and call back to its __getitem__ (with an integer this time). If we receive another slice, we use it to slice our range into a sub-range, and make another view.

Note that your A class should already be iterable, due to inheriting from list. To make the view iterable (and also get the in operator, forward and reverse iteration, .index and .count methods for free), you can inherit from collections.abc.Sequence. You just need a __getitem__ and __len__ - both easy to implement, as above - and the base class will do the rest (while also advertising to isinstance that your class has these properties).

You left out the definition of the `self.li` attribute, which — if you do it like the OP did — will result in a potentially *huge* (as in having 10**7 elements) list being created. This means the cleverness of returning an `A_view` instance might be for naught… — martineau, Jan 24 '22 at 00:48
I omitted it because, as `A` is already a subclass of `list`, it makes more sense to actually use the storage provided that way (`self[:] = ...`). I assumed that the initial values there were in some way necessary to the calculation. Of course, that may not be necessary; but in that case, it could very well be that there is no good reason to inherit from `list`, either. — Karl Knechtel, Jan 24 '22 at 00:50
It hard to image that the calculation of the value at a random index would depend on the sequential integer values coming before or following it — but I suppose anything's possible. Regardless, if you're going to leave it in, then you need to add the code that defines its value before it's used. — martineau, Jan 24 '22 at 01:12
Oh, I see what you mean. I forgot to fix the `super` call accordingly. The intent is to use the `self`'s own contained elements, rather than those of an attribute. Should be fixed now. — Karl Knechtel, Jan 24 '22 at 06:59
Thank you, this seems like a very neat solution, I will try it out. — Anon 100100, Jan 25 '22 at 16:12

How to implement the lazy slicing of a class instance?

2 Answers2