I have data that looks like this below and I'm trying to calculate the CRMSE (centered root mean squared error) by site_name and year. Maybe i need an agg function or a lambda function to do this at each groupby parameters (plant_name, year). The dataframe data for df3m1:
plant_name year month obsvals modelvals
0 ARIZONA I 2021 1 8.90 8.30
1 ARIZONA I 2021 2 7.98 7.41
2 CAETITE I 2021 1 9.10 7.78
3 CAETITE I 2021 2 6.05 6.02
The equation that I need to implement by plant_name and year looks like this:
crmse = df3m1.groupby(['plant_name','year'])(( (df3m1.obsvals - df3m1.obsvals.mean()) -
(df3m1.modelvals - df3m1.modelvals.mean()) ) ** 2).mean() ** .5
This is a bit advanced for me yet on how to integrate a groupby and a calculation at the same time. thank you. Final dataframe would look like:
plant_name year crmse
0 ARIZONA I 2021 ?
1 CAETITE I 2021 ?
I have tried things like this with groupby -
crmse = df3m1.groupby(['plant_name','year'])(( (df3m1.obsvals -
df3m1.obsvals.mean()) - (df3m1.modelvals - df3m1.modelvals.mean()) )
** 2).mean() ** .5
but get errors like this:
TypeError: 'DataFrameGroupBy' object is not callable