0

I have a data frame as shown below

Sector    Plot    Year       Amount   Month
SE1       1       2017       10       Sep
SE1       1       2018       10       Oct
SE1       1       2019       10       Jun
SE1       1       2020       90       Feb
SE1       2       2018       50       Jan
SE1       2       2017       100      May
SE1       2       2018       30       Oct
SE2       2       2018       50       Mar
SE2       2       2019       100      Jan

From the above I would like to prepare below data frame

Sector    Plot      Number_of_Times    Mean_Amount    Recent_Amount   Recent_year  Recent_Month    
SE1       1         4                  30             50              2020         Feb   
SE1       2         3                  60             30              2018         Oct
SE2       2         2                  75             100             2019         Jan
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Danish
  • 2,719
  • 17
  • 32
  • please consider adding a brief explanation about the expected output/logic..etc. – anky Feb 11 '20 at 12:58
  • Does this answer your question? [Apply multiple functions to multiple groupby columns](https://stackoverflow.com/questions/14529838/apply-multiple-functions-to-multiple-groupby-columns) – Brown Bear Feb 11 '20 at 13:00
  • provide extra information on what basis you want to produce that output. – The Guy Feb 11 '20 at 13:01

1 Answers1

1

So if all rows are sorted in input data use GroupBy.agg with named aggregations:

df1 = (df.groupby(['Sector','Plot']).agg(Number_of_Times=('Year','size'),
                                         Mean_Amount=('Amount','mean'),
                                         Recent_Amount=('Amount','last'),
                                         Recent_year=('Year','last'),
                                         Recent_Month=('Month','last')).reset_index())
print (df1)
  Sector  Plot  Number_of_Times  Mean_Amount  Recent_Amount  Recent_year  \
0    SE1     1                4           30             90         2020   
1    SE1     2                3           60             30         2018   
2    SE2     2                2           75            100         2019   

  Recent_Month  
0          Feb  
1          Oct  
2          Jan  

If necessary sorting convert Month to datetimes, add DataFrame.sort_values, apply solution and last convert months back to strings:

df['Month'] = pd.to_datetime(df['Month'], format='%b')

df1 = (df.sort_values(['Sector','Plot','Year','Month'])
         .groupby(['Sector','Plot']).agg(Number_of_Times=('Year','size'),
                                         Mean_Amount=('Amount','mean'),
                                         Recent_Amount=('Amount','last'),
                                         Recent_year=('Year','last'),
                                         Recent_Month=('Month','last')).reset_index())
df1['Recent_Month'] = df1['Recent_Month'].dt.strftime('%b')
print (df1)
  Sector  Plot  Number_of_Times  Mean_Amount  Recent_Amount  Recent_year  \
0    SE1     1                4           30             90         2020   
1    SE1     2                3           60             30         2018   
2    SE2     2                2           75            100         2019   

  Recent_Month  
0          Feb  
1          Oct  
2          Jan  

Another idea, buggy in pandas 0.25.1:

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
df['Month']  = pd.Categorical(df['Month'] , ordered=True, categories=months)

df1 = (df.sort_values(['Sector','Plot','Year','Month'])
         .groupby(['Sector','Plot']).agg(Number_of_Times=('Year','size'),
                                         Mean_Amount=('Amount','mean'),
                                         Recent_Amount=('Amount','last'),
                                         Recent_year=('Year','last'),
                                         Recent_Month=('Month','last')).reset_index())

print (df1)

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long long'

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252