Pandas Group By: Not able to retain the original dataframe post groupby

Question

This problem may be trivial one. I have a dataframe df as follows

DealerID    ZMONTH    ZREGION    Cust_Cat     ZBAS_QTY   ZBAS_VAL
1001        201905    ABC        M            200        750
1001        201906    ABC        N            300        480
1001        201907    NOP        P            800        1156
1002        201905    PQR        M            350        525
1002        201906    PST        M            480        690
1002        201907    SNP        P            200        780

I want to apply df.groupby() so that for each DealerID I get the row that has max() ZBAS_VAL. In other words, the resultant dataframe should look like

DealerID    ZMONTH    ZREGION    Cust_Cat     ZBAS_QTY   ZBAS_VAL
1001        201907    NOP        P            800        1156
1002        201907    SNP        P            200        780

My Approach so far:

df = df.groupby(['DealerID'])['ZBAS_VAL'].max().reset_index()

However, such approach is returning only ZBAS_VAL and DealerID columns. What I want is all other remaining column.

Any clue?

Use `df = df.loc[df.groupby(['DealerID'])['ZBAS_VAL'].idxmax()]` — jezrael, Dec 03 '19 at 10:13
Thanks. I was wondering what if instead of `max` we need to `sum` the column `ZPUR_RECE` — pythondumb, Dec 03 '19 at 10:15
Plus I am getting `AttributeError: 'SeriesGroupBy' object has no attribute 'idmax'` — pythondumb, Dec 03 '19 at 10:17
then pleae use some another aswer with sort_values and drop_duplicates — jezrael, Dec 03 '19 at 10:20
@jezrael: After `groupby` I am getting some unexpected `NaN` values. These are not supposed to be present. Any clue? Besides, instead `idmax()` I am using `max` — pythondumb, Dec 03 '19 at 11:20
not understand why use `max` instead `idxmax`, reason of NaNs is used `max` here — jezrael, Dec 03 '19 at 11:22

Pandas Group By: Not able to retain the original dataframe post groupby

0 Answers0