1

I am new to datascience and trying to understand difference evaluation in forecast vs actuals.

Lets say I have actuals:

27.580
25.950
 0.000 (Sum = 53.53)

And my predicted values using XGboost are:

    29.9
    25.4
    15.0 (Sum = 70.3)

Is it better to just evaluate based on the sum? example add all actuals minus all predicted? difference = 70.3 - 53.53?

Or is it better to evaluate the difference based on forecasting error techniques like MSE,MAE,RMSE,MAPE?

Since, I read MAPE is most widely accepted, how can it be implemented in cases where 0 is the denominator as can be seen in my actuals above?

Is there a better way to evaluate deviation from actuals or are these the only legitimate methods? My objective is to build more predictive models involving different variables which will give me different predicted values and then choose the one which has the least deviation from the actuals.

2 Answers2

1

If you are to evaluate based on each on each point or the sum, depends on your data and your use case.

For example, if each point represents a time bucket, and the accuracy of each time bucket is important (for example for a production plan), then I would say it is required to evaluate for each bucket.

If you are to measure the accuracy of the sum, then you might as well also forecast based on the sum.

For your question on MAPE then there is no way around the issue you mention here. Your data need to be non-zero for MAPE to be valuable. If you are only to assess one time series then you can use the MAE instead, and then you do not have the issue of the accuracy being infinite/undefined. But, there are many ways to measure accuracy, and my experience is that it very much depends on your use case and your data set which one that are preferable. See Hyndman's article on accuracy for intermittent demand, for some good points on accuracy measures.

0

I use MdAPE (Median Absolute Percentage Error) whenever MAPE is not possible to calculate due to 0s

nba2020
  • 618
  • 1
  • 8
  • 22