-1

I used to have a list and only needed to extract the max values in column 33 every day using below code and then export the data.

df_= pd.read_excel (r'file_location.xlsx')
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')


df_new = (df.groupby(pd.Grouper(key="Date",freq="D"))
            .agg({df.columns[33]: np.max})
            .reset_index())

Now I have a new task to extract the top 3 valus in the same column everyday. I tried below code but doesn't work.

Any idea?



df_= pd.read_excel (r'file_location.xlsx')
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')


df_new = (df.groupby(pd.Grouper(key="Date",freq="D"))
            .agg({df.columns[33]: np.head(3)})
            .reset_index())



PJL
  • 21
  • 3
  • @mozway can you found dupe for column by position `[df.columns[33]]` ? – jezrael Nov 30 '22 at 09:35
  • 1
    @jezrael honestly column slicing is covered in so many `groupby` question, this doesn't seem to be a sufficient addition, especially since OP already was slicing the columns correctly – mozway Nov 30 '22 at 09:39
  • Ok, I give you some time, please found solution with slicing by position column name with `head` or `nlargest`, now no match – jezrael Nov 30 '22 at 09:42
  • @mozway - h`onestly column slicing is covered in so many groupby question` In my opinion not, it is not used in many groupby question. If OP has only dupes in my opinion not create answer like bellow, in no dupe is used this altrantive. So reopened. – jezrael Nov 30 '22 at 10:51
  • 2
    @jezrael I believe it's just an abuse of the system you are doing, slicing what not an important part of the question. The use of `head` vs `agg` was and I had linked the appropriate duplicates. – mozway Nov 30 '22 at 10:58
  • @mozway - Question is what is not important part and what not? I think is important part slicing, not used very often in many `groupby` answers, so it is reason for reopen. – jezrael Nov 30 '22 at 11:01
  • 3
    @jezrael If you feel specific duplicates do not apply because this question asks about *multiple* things and each duplicates covers only one thing, please *close as lacking focus*. If you feel the difference to the duplicate is so minor that even your own answer does not explicitly point it out, please leave it closed. – MisterMiyagi Dec 13 '22 at 10:22
  • Does this answer your question? [Pandas get topmost n records within each group](https://stackoverflow.com/questions/20069009/pandas-get-topmost-n-records-within-each-group) – MisterMiyagi Dec 13 '22 at 10:23

1 Answers1

-1

You need specify column after groupby and call GroupBy.head without agg:

df_e_new = df.groupby(pd.Grouper(key="Date",freq="D"))[df.columns[33]].head(3)
      

Or use SeriesGroupBy.nlargest for top3 sorted values:

df_e_new = df.groupby(pd.Grouper(key="Date",freq="D"))[df.columns[33]].nlargest(3)

For sum top3 values use lambda function:

df_e_new = (df.groupby(pd.Grouper(key="Date",freq="D"))[df.columns[33]]
              .agg(lambda x: x.nlargest(3).sum())
              .reset_index(name='top3sum'))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252