I have a grouped dataframe consisting of a multilevel index of items (title ord_base7), a snapshot date of when sales forecasts were made, and the different models that made those forecasts along with each model's error (MAPE).
I'd like to get simplify the data frame to the model with the smallest MAPE for each ord_base7/snapshot date.
Below is a sample of the data for one item.
import pandas as pd
df_dict = {'ord_base7': ['100179K', '100179K', '100179K', '100179K', '100179K', '100179K',
'100179K', '100179K', '100179K', '100179K', '100179K', '100179K', '100179K', '100179K',
'100179K', '100179K', '100179K', '100179K','100179K', '100179K', '100179K', '100179K',
'100179K', '100179K', '100179K', '100179K', '100179K'],
'snapshot': ['2020-12-31 00:00:00', '2020-12-31 00:00:00',
'2020-12-31 00:00:00', '2020-12-31 00:00:00',
'2020-12-31 00:00:00', '2020-12-31 00:00:00',
'2020-12-31 00:00:00', '2021-01-31 00:00:00',
'2021-01-31 00:00:00', '2021-01-31 00:00:00',
'2021-01-31 00:00:00', '2021-01-31 00:00:00',
'2021-01-31 00:00:00', '2021-01-31 00:00:00',
'2021-02-28 00:00:00', '2021-02-28 00:00:00',
'2021-02-28 00:00:00', '2021-02-28 00:00:00',
'2021-02-28 00:00:00', '2021-02-28 00:00:00',
'2021-02-28 00:00:00', '2021-03-31 00:00:00',
'2021-03-31 00:00:00', '2021-03-31 00:00:00',
'2021-03-31 00:00:00', '2021-03-31 00:00:00',
'2021-03-31 00:00:00'],
'model': ['AutoArima', 'SARIMA', 'STLF', 'TESA', 'TESM', 'lstm_neural_net', 'prophet',
'AutoArima', 'SARIMA', 'STLF', 'TESA','TESM', 'lstm_neural_net', 'prophet',
'AutoArima', 'SARIMA', 'STLF', 'TESA', 'TESM', 'lstm_neural_net', 'prophet',
'AutoArima', 'SARIMA', 'STLF', 'TESA', 'TESM', 'lstm_neural_net'],
'MAPE': [71.50999999999999, 4439.13, 1.34, 96.31, 100.13000000000001, 70.17999999999999,
53.25, 18.32, 20227.11, 12.15, 92.25999999999999, 93.14, 12.2, 11.3, 8.99, 20664.37,
18.68, 15.8, 20.39, 1.69, 25.679999999999996, 3.0700000000000003, 8.68, 24.6, 5.43,
2.93, 1.87]}
df = pd.DataFrame.from_dict(df_dict)
df.set_index(['ord_base7', 'snapshot', 'model'],inplace = True)
I tried to just pull the min(MAPE) out but that's over the whole dataset. I need to pull it for each snapshot in time for each item so I know what model performed the best at that time.
How can I do this?