I have a dataset of several thousand timeseries. The data consists of monthly sales of different products (between 2016-2020), see the two examples below. Many of the time series (products) have outliers; which are due to additional demand from one-time projects/promotions. Unfortunately I do not have any data whatsoever about when/for which products this was the case. In the second example this would be the case for the two peaks of June 2016 and July 2018 (I apologise for the unreadable x-axis).
My goal is to eventually provide forecasts for each of the products. I'm expecting to achieve better results if I can first apply outlier corrections to such peaks, before applying forecasting models. Due to the sheer volume of products, it is not feasible for me to manually analyze/process each product. I'm looking for an automated procedure that could identify and correct these outliers, preferably in python.
Frankly, I'm a bit overwhelmed by the topic. I would greatly appreciate a list of steps/models/statistical tests/... that I should execute in sequence to solve this problem.
Some additional information that might help:
- products may or may not be seasonal seasonal or have a trend, but I do not know which.
- I plan to train forecasting models on 2016-2018 data, and create forecasts for entire 2019 for each product after having applied the outlier corrections (to calculate forecast accuracy)
- I have information about product hierarchy (about 100 product groups), I'm not sure if/how I should use this for the outlier detection. note: I'm interested on forecasts on product level rather than aggregate level
- I saw the term 'stationarity'; I do not know whether/how I should take this into account or not
Thanks a lot for help/insight on this matter
actual values (first value 1-2016, last value 12-2019:
ex1: [4.0, 11.0, 8.0, 4.0, 4.0, 9.0, 8.0, 5.0, 7.0, 10.0, 11.0, 3.0, 7.0, 5.0, 9.0, 3.0, 6.0, 5.0, 9.0, 1.0, 10.0, 9.0, 5.0, 2.0, 9.0, 1.0, 3.0, 8.0, 4.0, 4.0, 5.0, 5.0, 5.0, 8.0, 7.0, 5.0, 2.0, 8.0, 8.0, 4.0, 6.0, 8.0, 5.0, 4.0, 4.0, 7.0, 6.0, 4.0]
ex2: [8000.0, 8200.0, 16400.0, 13900.0, 13000.0, 15400.0, 44900.0, 5200.0, 12800.0, 17300.0, 9900.0, 12800.0, 13500.0, 17300.0, 11100.0, 15100.0, 15900.0, 20100.0, 14800.0, 6200.0, 8600.0, 12400.0, 15800.0, 14100.0, 18100.0, 26100.0, 19400.0, 14800.0, 15400.0, 48000.0, 13400.0, 11200.0, 14500.0, 12200.0, 16900.0, 4300.0, 8000.0, 11500.0, 11200.0, 17900.0, 7200.0, 19200.0, 18500.0, 6200.0, 6000.0, 11700.0, 14000.0, 7900.0, 13800.0]