Get Top 3 max values

Question

I used to have a list and only needed to extract the max values in column 33 every day using below code and then export the data.

df_= pd.read_excel (r'file_location.xlsx')
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')


df_new = (df.groupby(pd.Grouper(key="Date",freq="D"))
            .agg({df.columns[33]: np.max})
            .reset_index())

Now I have a new task to extract the top 3 valus in the same column everyday. I tried below code but doesn't work.

Any idea?



df_= pd.read_excel (r'file_location.xlsx')
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')


df_new = (df.groupby(pd.Grouper(key="Date",freq="D"))
            .agg({df.columns[33]: np.head(3)})
            .reset_index())

@mozway can you found dupe for column by position `[df.columns[33]]` ? — jezrael, Nov 30 '22 at 09:35
@jezrael honestly column slicing is covered in so many `groupby` question, this doesn't seem to be a sufficient addition, especially since OP already was slicing the columns correctly — mozway, Nov 30 '22 at 09:39
Ok, I give you some time, please found solution with slicing by position column name with `head` or `nlargest`, now no match — jezrael, Nov 30 '22 at 09:42
@mozway - h`onestly column slicing is covered in so many groupby question` In my opinion not, it is not used in many groupby question. If OP has only dupes in my opinion not create answer like bellow, in no dupe is used this altrantive. So reopened. — jezrael, Nov 30 '22 at 10:51
@jezrael I believe it's just an abuse of the system you are doing, slicing what not an important part of the question. The use of `head` vs `agg` was and I had linked the appropriate duplicates. — mozway, Nov 30 '22 at 10:58
@mozway - Question is what is not important part and what not? I think is important part slicing, not used very often in many `groupby` answers, so it is reason for reopen. — jezrael, Nov 30 '22 at 11:01
@jezrael If you feel specific duplicates do not apply because this question asks about *multiple* things and each duplicates covers only one thing, please *close as lacking focus*. If you feel the difference to the duplicate is so minor that even your own answer does not explicitly point it out, please leave it closed. — MisterMiyagi, Dec 13 '22 at 10:22
Does this answer your question? [Pandas get topmost n records within each group](https://stackoverflow.com/questions/20069009/pandas-get-topmost-n-records-within-each-group) — MisterMiyagi, Dec 13 '22 at 10:23

jezrael · Accepted Answer · 2022-11-30T09:44:50.643

You need specify column after groupby and call GroupBy.head without agg:

df_e_new = df.groupby(pd.Grouper(key="Date",freq="D"))[df.columns[33]].head(3)

Or use SeriesGroupBy.nlargest for top3 sorted values:

df_e_new = df.groupby(pd.Grouper(key="Date",freq="D"))[df.columns[33]].nlargest(3)

For sum top3 values use lambda function:

df_e_new = (df.groupby(pd.Grouper(key="Date",freq="D"))[df.columns[33]]
              .agg(lambda x: x.nlargest(3).sum())
              .reset_index(name='top3sum'))

Get Top 3 max values

1 Answers1