6

I have written a data container class which essentially contains a numpy ndarray member along with methods to generate time_series masks/cross-sectional masks, fetch the date index (row#) in ring-buffer mode, handle resizing keeping in mind that the data may be a ring buffer, and implement restrictions on the shape/dimensions, etc.

As a result of my class implementation, now that I've to access the data wrapped by this object by explicitly referring to the *.data member. This is cumbersome and I'd like to implement the [] operator in my class such that when called on a instance of my class, it refers to the same operation on the underlying ndarray object. How can I achieve this?

def MyArray(object):
    def __init__(self, shape, fill_value, dtype):
        self.shape = shape
        self.fill_value = fill_value
        self.dtype = dtype
        self.data = numpy.empty(shape, fill_value=fill_value, dtype=dtype)

    def reset(self, fill_value=None):
        self.data.fill(fill_value or self.fill_value)

    def resize(self, shape):
        if self.data.ndim != len(shape): raise Exception("dim error")
        if self.data.shape < shape: raise Exception("sizing down not permitted")
        # do resizing

Now, if I'd like to use this container elsewhere, I have to use it as such:

arr = MyArray(shape=(10000,20), fill_value=numpy.nan, dtype='float')
arr.data[::10] = numpy.NAN
msk = numpy.random.randn(10000,20)<.5
arr.data[~msk] = -1.

The fact that I need to explicitly refer to arr.data every time I use this is too cumbersome and error-prone (I'm forgetting the .data suffix in so many places).

Is there any way I can add a few operators such that slicing and indexing on arr actually operates on arr.data implicitly?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Mindstorm
  • 443
  • 1
  • 5
  • 12

1 Answers1

12

You need to implement the __getitem__ and __setitem__ magic functions.

A complete overview for the magic methods can be found here.

import numpy as np

class MyArray():
    def __init__(self):
        self.data = np.zeros(10)

    def __getitem__(self, key):
        return self.data[key]

    def __setitem__(self, key, value):
        self.data[key] = value

    def __repr__(self):
        return 'MyArray({})'.format(self.data)


a = MyArray()

print(a[9])
print(a[1:5])
a[:] = np.arange(10)
print(a)

Which will give you this result:

0.0
[ 0.  0.  0.  0.]
MyArray([ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.])

Inheritance

If you just want to modify or add behaviour of np.ndarray, you could inherit from it. This is a little more complicated than for normal python classes, but implementing your case should be not that hard:

import numpy as np


class MyArray(np.ndarray):

    def __new__(cls, shape, fill_value=0, dtype=float):
        data = np.full(shape, fill_value, dtype)
        obj = np.asarray(data).view(cls)
        obj.fill_value = fill_value
        return obj

    def reset(self, fill_value=None):
        if fill_value is not None:
            self.fill_value = fill_value

        self.fill(self.fill_value)

For more info, see here.

Neuron
  • 5,141
  • 5
  • 38
  • 59
MaxNoe
  • 14,470
  • 3
  • 41
  • 46
  • 1
    excellent. One question though: would I have to write down every method to fully delegate [] calls on arr to arr.data? there's one method for a+= b(iadd), one for a*= b(imul). Isn't there a more concise way than enumerating all these methods in my wrapper class? – Mindstorm Nov 23 '15 at 22:41
  • 3
    You could inherit from array and implement or override methods you need. – MaxNoe Nov 23 '15 at 22:42
  • However, this is a little complicated but dealt with in deep here: http://docs.scipy.org/doc/numpy/user/basics.subclassing.html – MaxNoe Nov 23 '15 at 22:57
  • I added a inheritance solution to my answer. – MaxNoe Nov 23 '15 at 23:15
  • My implementation of the class as an ndarray subclass ran into issues when I try to resize: `>>> a.resize((10,2)) Traceback (most recent call last): File "", line 1, in ValueError: cannot resize an array references or is referenced by another array in this way. Use the resize function. >>> a.view(np.ndarray).resize((10,2)) Traceback (most recent call last): File "", line 1, in ValueError: cannot resize this array: it does not own its data` – Mindstorm Nov 24 '15 at 12:41
  • check out `def __array_finalize__(self, obj)` so you don't get caught out in more exotic circumstances. – Alexander McFarlane Jun 27 '16 at 19:42