8

Original problem description

The problem arises when I implement some machine learning algorithm with numpy. I want some new class ludmo which works the same as numpy.ndarray, but with a few more properties. For example, with a new property ludmo.foo. I've tried several methods below, but none is satisfactory.

1. Wrapper

First I created a wrapper class for numpy.ndarray, as

import numpy as np

class ludmo(object):
    def __init__(self)
        self.foo = None
        self.data = np.array([])

But when I use some function (in scikit-learn which I cannot modify) to manipulate a list of np.ndarray instance, I have to first extract all data field of each ludmo object and collect them into a list. After that the list is sorted and I lost the correspondence between the data and original ludmo objects.

2. Inheritance

Then I tried to make ludmo a subclass of numpy.ndarray, as

import numpy as np

class ludmo(np.ndarray):
    def __init__(self, shape, dtype=float, buffer=None, offset=0, strides=None, order=None)
        super().__init__(shape, dtype, buffer, offset, strides, order)
        self.foo = None

But another problem arises then: the most common way to create a numpy.ndarray object is numpy.array(some_list), which returns a numpy.ndarray object, and I have to convert it to a ludmo object. But till now I found no good way to do this; simply changing the __class__ attribute will result in an error.

I'm new to Python and numpy, so there must be some elegant way that I don't know. Any advice is appreciated.

It's better if anyone can give an generic solution, which not only applies to the numpy.ndarray class but also all kinds of classes.

Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
xyguo
  • 275
  • 2
  • 6
  • 7
    There's [a page in the numpy docs about that](http://docs.scipy.org/doc/numpy/user/basics.subclassing.html). – Phillip Mar 02 '15 at 07:31

2 Answers2

4

As explained in the docs you could add your own methods to np.ndarray doing:

import numpy as np

class Ludmo(np.ndarray): 
    def sumcols(self):
        return self.sum(axis=1)

    def sumrows(self):
        return self.sum(axis=0)

    def randomize(self):
        self[:] = np.random.rand(*self.shape)

and then creating the instances using the np.ndarray.view() method:

a = np.random.rand(4,5).view(Ludmo)

And use the __array_finalize__() method to define new attributes:

def __array_finalize__(self, arr):
    self.foo = 'foo'
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
2

Since you ask about a generic solution, here's a generic wrapper class that you can use: (from http://code.activestate.com/recipes/577555-object-wrapper-class/ )

class Wrapper(object):
    '''
    Object wrapper class.
    This a wrapper for objects. It is initialiesed with the object to wrap
    and then proxies the unhandled getattribute methods to it.
    Other classes are to inherit from it.
    '''
    def __init__(self, obj):
        '''
        Wrapper constructor.
        @param obj: object to wrap
        '''
        # wrap the object
        self._wrapped_obj = obj

    def __getattr__(self, attr):
        # see if this object has attr
        # NOTE do not use hasattr, it goes into
        # infinite recurrsion
        if attr in self.__dict__:
            # this object has it
            return getattr(self, attr)
        # proxy to the wrapped object
        return getattr(self._wrapped_obj, attr)

the way this works is:

when e.g. skicit would call ludmo.data python actually calls ludmo.__getattribute__('data') if ludmo doesn't have the 'data' attribute, python will call ludmo.__getattr__('data')

by overridding the __getattr__ function you intercept this call, check if your ludmo has the data attribute (again, you could get into recursion otherwise), and send the call to your internal object. So you should have covered every possible call to your internal numpy object.

update: You would also have to implement __setattr__ the same way, or you would get this

>>> class bla(object):
...  def __init__(self):
...   self.a = 1
...  def foo(self):
...   print self.a
...
>>> d = Wrapper(bla())
>>> d.a
1
>>> d.foo()
1
>>> d.a = 2
>>> d.a
2
>>> d.foo()
1

and you probably also want to set a new metaclass that intercepts calls to magic functions of new style classes (for full class see https://github.com/hpcugent/vsc-base/blob/master/lib/vsc/utils/wrapper.py for info see How can I intercept calls to python's "magic" methods in new style classes? ) however, this is only needed if you still want to be able to access x.__name__ or x.__file__ and get the magic attribute from the wrapped class, and not your class.

# create proxies for wrapped object's double-underscore attributes
    class __metaclass__(type):
        def __init__(cls, name, bases, dct):

            def make_proxy(name):
                def proxy(self, *args):
                    return getattr(self._obj, name)
                return proxy

            type.__init__(cls, name, bases, dct)
            if cls.__wraps__:
                ignore = set("__%s__" % n for n in cls.__ignore__.split())
                for name in dir(cls.__wraps__):
                    if name.startswith("__"):
                        if name not in ignore and name not in dct:
                            setattr(cls, name, property(make_proxy(name)))
Community
  • 1
  • 1
Jens Timmerman
  • 9,316
  • 1
  • 42
  • 48
  • This solution is fine when I only access `ludmo.data` but not modify it. I referred to the documentation and it said that, when I modify an non-exist field python would silently create a new one. So when I write `ludmo.data = 2`, does python set `ludmo.obj.data` to 2 or it just create a new field `ludmo.data`? Maybe I should define `__setattr__` as well? – xyguo Mar 04 '15 at 13:43
  • I would accept your answer, though I still can't understand how the last piece of code (the `__metaclass__`) got to work. If you could explain a bit more about it or give a reference link, that would be better. – xyguo Mar 10 '15 at 15:09