0

I am new to pandas, and I am trying to get Outcome Dataframe from df. Here I am trying to get the largest Sale in each Month and which week the sale is made.

df = pd.DataFrame({"Month": ["Jan", "Jan", "Jan", "Jan", "Feb", "Feb", "Feb", "Mar"],"Week": [1, 2, 3, 4, 1, 2, 3, 2],"Sales": [1000, 50, 200, 300, 250, 150, 100, 300]})

  Month  Week  Sales
0   Jan     1   1000
1   Jan     2     50
2   Jan     3    200
3   Jan     4    300
4   Feb     1    250
5   Feb     2    150
6   Feb     3    100
7   Mar     2    300

Output:

Outcome = pd.DataFrame({"Month": ["Jan", "Feb", "Mar"],"Week": [1, 1, 2],"Sales": [1000, 250, 300]}) 

  Month  Week  Sales
0   Jan     1   1000
1   Feb     1    250
2   Mar     2    300
Emi OB
  • 2,814
  • 3
  • 13
  • 29

1 Answers1

0

You need to use df.groupby and idxmax() to get the indexes of the maximum sales values, then put that into df.loc to return the rows where the maximum sales of each month occur. I've added a sort_index() and reset_index() to get it in the same order/index as your example.

df.loc[df.groupby('Month')['Sales'].idxmax()].sort_index().reset_index(drop=True)

Output:

  Month  Week  Sales
0   Jan     1   1000
4   Feb     1    250
7   Mar     2    300
Emi OB
  • 2,814
  • 3
  • 13
  • 29