As mentioned a while back by @piRSquared, subclassing a pandas DataFrame in the way suggested in the docs, or by geopandas' GeoDataFrame, opens up the possibility of unwanted mutation of the original object:
class SubFrame(pd.DataFrame):
def __init__(self, *args, **kwargs):
attr = kwargs.pop('attr', None)
super(SubFrame, self).__init__(*args, **kwargs)
self.attr = attr
@property
def _constructor(self):
return SubFrame
def somefunc(self):
"""Add some extended functionality."""
pass
df = pd.DataFrame([[1, 2], [3, 4]])
sf = SubFrame(df, attr=1)
sf[:] = np.nan # Modifies `df`
print(df)
# 0 1
# 0 NaN NaN
# 1 NaN NaN
The error-prone "fix" is to pass a copy at instantiation:
sf = SubFrame(df.copy(), attr=1)
But this is easily subject to user error. My question is: can I create a copy of self
(the passed DataFrame) within class SubFrame
itself?
How would I go about doing this?
If the answer is "No," I appreciate that too, so that I can scrap this endeavor before wasting time on it.
A polite request
The pandas docs suggest two alternatives:
- Extensible method chains with
pipe
- Composition
I've already thoroughly considered both of these, so I'd appreciate if answers can refrain from an opinionated discussion with generic reasons about why these 2 alternatives are better/safer.