0

I have dataframe called df_engage its columns are ['district_id', 'time', 'pct_access', 'lp_id']

district_id time pct_access lp_id
1000 2020-01-01 3.60 93690
1000 2020-01-01 0.03 17941
1000 2020-01-01 0.03 65358
1000 2020-01-01 0.57 98265
1000 2020-01-02 41.76 59257

I have extracted the max pct_access in each day

df_engage_max = df_engage.groupby(['district_id', 'time'], as_index=False)['pct_access'].max()

the result is:

district_id time pct_access
1000 2020-01-01 3.60
1000 2020-01-02 41.76
1000 2020-01-03 49.76

NOW, I need to append column lp_id from df_engage to df_engage_max based on df_engage_max[['district_id', 'time', 'pct_access']] that means, the result should be:

district_id time pct_access lp_id
1000 2020-01-01 3.6 93690
1000 2020-01-02 41.76 59257

any help please :) ?

j__carlson
  • 1,346
  • 3
  • 12
  • 20
  • Does this answer your question? [pandas add column to groupby dataframe](https://stackoverflow.com/questions/37189878/pandas-add-column-to-groupby-dataframe) – Mih Zam Aug 23 '21 at 11:28

2 Answers2

0

Join the 2 tables up.

df_engage_max.merge(df_engage, how='inner', on=['district_id', 'time', 'pct_access'])

There'll be a few extra columns you will need to drop but that's the gist.

Quixotic22
  • 2,894
  • 1
  • 6
  • 14
  • thanks alot, it works fine, I have tried this solution but my mistake is I wrote df_engage_max = df_engage_max.merge(df_engage, how='inner', on=['district_id', 'time', 'pct_access']) – Ahmed Emad Aug 23 '21 at 11:57
0

Use idxmax instead of max:

# df is df_engage
>>> df.loc[df.groupby(['district_id', 'time'])['pct_access'].idxmax()]

   district_id       time  pct_access  lp_id
0         1000 2020-01-01        3.60  93690
4         1000 2020-01-02       41.76  59257

For more explanation, refer to the link posted by @jezrael:

Get the row(s) which have the max value in groups using groupby

Corralien
  • 109,409
  • 8
  • 28
  • 52