Context:
I use Pandas on a daily basis (processing measurement data) and I'd like learn more about Python.
To do so, I`m working on a (wrapper) class --- MyDataFrame --- that combines Pandas DataFrame functionality with that of Pint --- a Python package to define, operate and manipulate physical quantities.
I've already managed to get some basic functionality via __str__
, __get/setitem__
and a __truediv__
for MyDataFrame's underlying MySeries (wrapper of Pandas Series):
API Example:
import pint
_u = pint.UnitRegistry()
_u("meter")
>>> 1 meter
type(_u("meter"))
>>> pint.unit.build_quantity_class.<locals>.Quantity
data = [[0,1,2,3],[4,5,6,7]]
df = pd.DataFrame(data,columns=["time","distance"])
units = {"distance":_u("meter"), "time":_u("second")}
mdf = MyDataFrame(df, units)
mdf["speed"] = mdf["distance"]/mdf["time"]
mdf["speed"].unit == _u("meter per second")
>>> True
So far I've kept the implementation very minimal, eg.:
class MyDataFrame:
"""df: pandas DataFrame, units: dict of str - Pint quantity key-value pairs."""
def __init__(self,df,units):
error_handling(df,units)
self.df = df
self.units = units
def __getitem__(self,key):
if key in units.keys():
return MySeries(self.df[key],self.units[key])
class MySeries:
"""series: pandas Series, units: a Pint quantity value."""
def __init__(self,series,unit):
self.series = series
self.unit = unit
def __truediv__(self,other):
return MySeries(self.series/other.series,self.unit/other.unit)
Question:
But now I'd like to extend this basic concept such that we can do e.g.
mdf["speed"] * 60*_u(second)
in other words make MySeries __mul__()
polymorphic --- not only multiply MySeries with MySeries but also MySeries with Pint Quantities (or even vice versa). What could be a good approach?
My first idea was for __mul__(self,other)
to check the type of self
or other
. However, reading more about polymorphism in Python (here) left me wondering how others would implement such polymorphic binary operations.
Let me know if I should give some clarifications.
PS: As an aside. I notice that in trying to mimic Pandas syntax I`m writing wrappers such as
def __getitem__(self,key):
return self.series[key]
def notnull(self):
return self.series.notnull()
Any advice on redirect all usual Pandas method calls to the Pandas part of the MyDataFrame / MySeries class?
Btw, I get the hitch that it is time for me to delve into Python's docs...