Pandas `groupby.aggregate` on `df.index.duplicated()`

Question

Scenario. Assume a

pd.DataFrame, loaded from an external source
where one row is a line from a sensor. The index is a DateTimeIndex
with some rows having df.index.duplicated()==True. This actually means, there are lines with the same timestamp from different sensors.

Now applying some logic, like df.loc[df.A>0, 'my_col'] = 1, I ran into ValueError: cannot reindex from a duplicate axis. This can be solved by simply removing the duplicated rows using

df[~df.index.duplicated()]

But I wonder, if it would be possible, to actually apply a column based function during the Index de-duplication process? E.g.: Calculating the mean/max/min of column A/B/C for the duplicated rows.

Is this possible? Its something like a groupby.aggregate on df.index.duplicated() rows.

have you tried something like `df.groupby(df.index).mean()`? — Ben.T, Jun 02 '20 at 22:28
Thank you both for your reply. That would in fact apply the `mean()` function on **all** columns, not only specific ones. E.g: If I would like to hold the `max` value on column `A` and the `mean` value for column `B`, that would not work. — gies0r, Jun 02 '20 at 22:42

score 0 · Answer 1 · answered Jun 02 '20 at 23:03

0

Check with describe

df.groupby(level=0).describe()

answered Jun 02 '20 at 23:03

BENY

317,841
20
164
234

Pandas `groupby.aggregate` on `df.index.duplicated()`

1 Answers1

Linked