Scenario. Assume a
pd.DataFrame
, loaded from an external source- where one row is a line from a sensor. The index is a
DateTimeIndex
- with some rows having
df.index.duplicated()==True
. This actually means, there are lines with the same timestamp from different sensors.
Now applying some logic, like df.loc[df.A>0, 'my_col'] = 1
, I ran into ValueError: cannot reindex from a duplicate axis. This can be solved by simply removing the duplicated rows using
df[~df.index.duplicated()]
But I wonder, if it would be possible, to actually apply a column based function during the Index de-duplication process? E.g.: Calculating the mean/max/min
of column A/B/C
for the duplicated rows.
Is this possible? Its something like a groupby.aggregate
on df.index.duplicated()
rows.