1

Context:

I use Pandas on a daily basis (processing measurement data) and I'd like learn more about Python.

To do so, I`m working on a (wrapper) class --- MyDataFrame --- that combines Pandas DataFrame functionality with that of Pint --- a Python package to define, operate and manipulate physical quantities.

I've already managed to get some basic functionality via __str__, __get/setitem__ and a __truediv__ for MyDataFrame's underlying MySeries (wrapper of Pandas Series):

API Example:

import pint
_u = pint.UnitRegistry()

_u("meter")
>>> 1 meter

type(_u("meter"))
>>> pint.unit.build_quantity_class.<locals>.Quantity

data = [[0,1,2,3],[4,5,6,7]]

df = pd.DataFrame(data,columns=["time","distance"])

units = {"distance":_u("meter"), "time":_u("second")}

mdf = MyDataFrame(df, units)

mdf["speed"] = mdf["distance"]/mdf["time"]

mdf["speed"].unit == _u("meter per second")
>>> True

So far I've kept the implementation very minimal, eg.:

class MyDataFrame:
"""df: pandas DataFrame, units: dict of str - Pint quantity key-value pairs."""
    def __init__(self,df,units):

        error_handling(df,units)

        self.df = df
        self.units = units

    def __getitem__(self,key):
        if key in units.keys():
            return MySeries(self.df[key],self.units[key]) 

class MySeries:
"""series: pandas Series, units: a Pint quantity value."""
    def __init__(self,series,unit):
        self.series = series
        self.unit = unit

    def __truediv__(self,other):
        return MySeries(self.series/other.series,self.unit/other.unit)

Question:

But now I'd like to extend this basic concept such that we can do e.g.

mdf["speed"] * 60*_u(second)

in other words make MySeries __mul__() polymorphic --- not only multiply MySeries with MySeries but also MySeries with Pint Quantities (or even vice versa). What could be a good approach?

My first idea was for __mul__(self,other) to check the type of self or other. However, reading more about polymorphism in Python (here) left me wondering how others would implement such polymorphic binary operations.

Let me know if I should give some clarifications.

PS: As an aside. I notice that in trying to mimic Pandas syntax I`m writing wrappers such as

def __getitem__(self,key):
    return self.series[key]

def notnull(self):
    return self.series.notnull()

Any advice on redirect all usual Pandas method calls to the Pandas part of the MyDataFrame / MySeries class?

Btw, I get the hitch that it is time for me to delve into Python's docs...

Community
  • 1
  • 1
balletpiraat
  • 206
  • 1
  • 11

1 Answers1

0

Unfortunately, there is no other way. Polymorphism has already been used by implementing mul for every type of Pandas, so the corresponding operator behaves different on the type of the first argument. However, for the second argument, you have to check the type. In static languages this would be done by overloading the function based on the second argument type, but in Python, you have to use isinstance. Even the Python standard library itself uses that approach, if you look at the source.

blue_note
  • 27,712
  • 9
  • 72
  • 90
  • Thanks for the explanation. Handling via `isistance` is alright for me if it werent for some difficulty handling the Pint main class types : e.g. `type(_u("meter"))` returns `pint.unit.build_unit_class..Unit` where I would expect it to return something like `Unit`. Any clue? – balletpiraat Jan 26 '17 at 12:36
  • 1
    Don't use `type`, it ignores subclasses. Use `isinstance(_u("meter"), Unit)` instead – blue_note Jan 26 '17 at 12:42
  • That already clarifies much, however the above returns `name 'Unit' is not defined` which makes me think that `import pint` is not enough to make Python aware of Pint's `Unit` class. BTW, I'm working in a Jupyter notebook in case that matters. Would the solution be something akin to `from pint import Unit`? – balletpiraat Jan 26 '17 at 12:49
  • No, it is not enough, `import pint` imports the package only. You should either `from pint.a.b.c import Unit` (or something like that), or refer to the class as `pint.a.b.c.Unit`. (a, b, c are the subpackages of pint, I don't know what they are, see pint structure). This has to do with how Python imports (read the corresponding chapter of the documentation), not with jupyter or the library. – blue_note Jan 26 '17 at 12:54
  • It seems like my issue has to do with the actual implementation in Pint of the classes, or not? `_u = pint.UnitRegistry()` `Quantity = pint.unit.build_quantity_class(_u)` `isinstance(_u("m"),Quantity)` `>>>False` – balletpiraat Jan 26 '17 at 12:56
  • don't know, that's pint specific. however, it seems unlikely that there's such a trivial error with the library, you are probably calling the wrong function – blue_note Jan 26 '17 at 13:07