3

I am working with 2 data sets on the order of ~ 100,000 values. These 2 data sets are simply lists. Each item in the list is a small class.

class Datum(object):
    def __init__(self, value, dtype, source, index1=None, index2=None):
        self.value = value
        self.dtype = dtype
        self.source = source
        self.index1 = index1
        self.index2 = index2

For each datum in one list, there is a matching datum in the other list that has the same dtype, source, index1, and index2, which I use to sort the two data sets such that they align. I then do various work with the matching data points' values, which are always floats.

Currently, if I want to determine the relative values of the floats in one data set, I do something like this.

minimum = min([x.value for x in data])
for datum in data:
    datum.value -= minimum

However, it would be nice to have my custom class inherit from float, and be able to act like this.

minimum = min(data)
data = [x - minimum for x in data]

I tried the following.

class Datum(float):                                                                                                                                                                                                                                        
    def __new__(cls, value, dtype, source, index1=None, index2=None):                                                        
        new = float.__new__(cls, value)                                                                            
        new.dtype = dtype                                                                                          
        new.source = source                                                                                        
        new.index1 = index1                                                                                                  
        new.index2 = index2
        return new

However, doing

data = [x - minimum for x in data]

removes all of the extra attributes (dtype, source, index1, index2).

How should I set up a class that functions like a float, but holds onto the extra data that I instantiate it with?

UPDATE: I do many types of mathematical operations beyond subtraction, so rewriting all of the methods that work with a float would be very troublesome, and frankly I'm not sure I could rewrite them properly.

Rick
  • 43,029
  • 15
  • 76
  • 119
Eric Hansen
  • 336
  • 2
  • 12
  • 1
    you probably need to define all the maths methods - so that any math operation results in a Datum and not a float. – Tony Suffolk 66 Nov 07 '14 at 20:51
  • 1
    assuming that dtype, source, index1 & index2 don't change when you do the calculation, then my answer should work. and the extra methods aren't complex in your case. – Tony Suffolk 66 Nov 07 '14 at 21:25

2 Answers2

2

I suggest subclassing float and using a couple decorators to "capture" the float output from any method (except for __new__ of course) and returning a Datum object instead of a float object.

First we write the method decorator (which really isn't being used as a decorator below, it's just a function that modifies the output of another function, AKA a wrapper function):

def mydecorator(f,cls):
    #f is the method being modified, cls is its class (in this case, Datum)
    def func_wrapper(*args,**kwargs):
        #*args and **kwargs are all the arguments that were passed to f
        newvalue = f(*args,**kwargs)
        #newvalue now contains the output float would normally produce
        ##Now get cls instance provided as part of args (we need one
        ##if we're going to reattach instance information later):
        try:
            self = args[0]
            ##Now check to make sure new value is an instance of some numerical 
            ##type, but NOT a bool or a cls type (which might lead to recursion)
            ##Including ints so things like modulo and round will work right
            if (isinstance(newvalue,float) or isinstance(newvalue,int)) and not isinstance(newvalue,bool) and type(newvalue) != cls:
                ##If newvalue is a float or int, now we make a new cls instance using the
                ##newvalue for value and using the previous self instance information (arg[0])
                ##for the other fields
                return cls(newvalue,self.dtype,self.source,self.index1,self.index2)
        #IndexError raised if no args provided, AttributeError raised of self isn't a cls instance
        except (IndexError, AttributeError): 
            pass
        ##If newvalue isn't numerical, or we don't have a self, just return what
        ##float would normally return
        return newvalue
    #the function has now been modified and we return the modified version
    #to be used instead of the original version, f
    return func_wrapper

The first decorator only applies to a method to which it is attached. But we want it to decorate all (actually, almost all) the methods inherited from float (well, those that appear in the float's __dict__, anyway). This second decorator will apply our first decorator to all of the methods in the float subclass except for those listed as exceptions (see this answer):

def for_all_methods_in_float(decorator,*exceptions):
    def decorate(cls):
        for attr in float.__dict__:
            if callable(getattr(float, attr)) and not attr in exceptions:
                setattr(cls, attr, decorator(getattr(float, attr),cls))
        return cls
    return decorate

Now we write the subclass much the same as you had before, but decorated, and excluding __new__ from decoration (I guess we could also exclude __init__ but __init__ doesn't return anything, anyway):

@for_all_methods_in_float(mydecorator,'__new__')
class Datum(float):
    def __new__(klass, value, dtype="dtype", source="source", index1="index1", index2="index2"):
        return super(Datum,klass).__new__(klass,value)
    def __init__(self, value, dtype="dtype", source="source", index1="index1", index2="index2"):
        self.value = value
        self.dtype = dtype
        self.source = source
        self.index1 = index1
        self.index2 = index2
        super(Datum,self).__init__()

Here are our testing procedures; iteration seems to work correctly:

d1 = Datum(1.5)
d2 = Datum(3.2)
d3 = d1+d2
assert d3.source == 'source'
L=[d1,d2,d3]
d4=max(L)
assert d4.source == 'source'
L = [i for i in L]
assert L[0].source == 'source'
assert type(L[0]) == Datum
minimum = min(L)
assert [x - minimum for x in L][0].source == 'source'

Notes:

  • I am using Python 3. Not certain if that will make a difference for you.
  • This approach effectively overrides EVERY method of float other than the exceptions, even the ones for which the result isn't modified. There may be side effects to this (subclassing a built-in and then overriding all of its methods), e.g. a performance hit or something; I really don't know.
  • This will also decorate nested classes.
  • This same approach could also be implemented using a metaclass.
Community
  • 1
  • 1
Rick
  • 43,029
  • 15
  • 76
  • 119
1

The problem is when you do :

x - minimum

in terms of types you are doing either :

datum - float, or datum - integer

Either way python doesn't know how to do either of them, so what it does is look at parent classes of the arguments if it can. since datum is a type of float, it can easily use float - and the calculation ends up being

float - float 

which will obviously result in a 'float' - python has no way of knowing how to construct your datum object unless you tell it.

To solve this you either need to implement the mathematical operators so that python knows how to do datum - float or come up with a different design.

Assuming that 'dtype', 'source', index1 & index2 need to stay the same after a calculation - then as an example your class needs :

def __sub__(self, other):
      return datum(value-other, self.dtype, self.source, self.index1, self.index2)

this should work - not tested

and this will now allow you to do this

d = datum(23.0, dtype="float", source="me", index1=1)
e = d - 16
print e.value, e.dtype, e.source, e.index1, e.index2

which should result in :

7.0 float  me  1  None
Tony Suffolk 66
  • 9,358
  • 3
  • 30
  • 33