5

I have a database metrics grouped by day, and I need to forecast the data for the next 3 months. These data have seasonality, (I believe that the seasonality is by days of the week).

I want to use the Holt Winters method using R, I need to create a time series object, which asks for frequency, (That I think is 7). But how can I know if I'm sure? Have a function to identify the best frequency?

I'm using:

FID_TS <- ts(FID_DataSet$Value, frequency=7)

FID_TS_Observed <- HoltWinters(FID_TS)

If I decompose this data with decompose(FID_TS), I have:

enter image description here

And this is my first forecast FID_TS_Observed:

enter image description here

When I look at the history of the last year, they starts low in the first 3 months and increase from month 3 to 11, when they decrease again.

Maybe my daily data, have a daily have a weekly seasonality (frequency=7) and an monthly seasonality (frequency=7x30=210)? I need the last 365 days?

Have any way to put the frequency by day of the week and by month? Another thing, does it make any difference I take the whole last year or just a part of it to use in the Holt-Winters method?

Thanks in advance :)

Sotos
  • 51,121
  • 6
  • 32
  • 66
Evan Bessa
  • 59
  • 1
  • 4
  • Have a look at `library(forecast)` which has the `msts` function that can take multiple frequencies. – Sotos Mar 08 '18 at 13:20
  • Thanks @Sotos, maybe something like this? (weekly 7 , monthly 30) `FID_TS <- msts(FID_DataSet$Value, seasonal.periods=7, frequency=30)` because don`t works :( – Evan Bessa Mar 08 '18 at 13:24
  • Have a look at [this link](https://stats.stackexchange.com/questions/74418/frequency-of-time-series-in-r) – Sotos Mar 08 '18 at 13:26
  • Thanks @Sotos, I tried this: 07 (number of intervals per week) 05 (number of intervals per month assuming a 30 days month). `FID_TS <- msts(FID_DataSet$Value, seasonal.periods = c(7,7*5))` and works, improved my results :) but, I tried this and didn't work 12 (number of intervals per year assuming a 12 months per year). `FID_TS <- msts(FID_DataSet$Value, seasonal.periods = c(7,7*5, 7*5*12))` My data only have 420 rows – Evan Bessa Mar 08 '18 at 14:22
  • check this one: https://anomaly.io/detect-seasonality-using-fourier-transform-r/ – milos.ai Mar 08 '18 at 15:00
  • Thanks @grubjesic, I will try this also. Just to know, if I have some peaks in my time series, how can I smooth this peaks? Because is anormal for my seasonal tseries :) Tks in advance – Evan Bessa Mar 09 '18 at 09:50

2 Answers2

2

Usually, the frequency (or seasonality, you seem to be using the words interchangeably in your post) is determined by domain knowledge. For example if I am working in the restaurant business, and I am analyzing an hourly data set of customers, I know that I will have a 24 hour frequency, with spikes during lunch time and dinner time, and another 168 hour frequency (24 * 7) because there will be a weekly pattern to my customers.

If for some reason, you don't have domain knowledge, you can use the ACF and the PACF, as well as Fourrier analysis to finds the best frequencies for your data.

Have any way to put the frequency by day of the week and by month?

With Holt-Winters, no. HW takes only one seasonal component. For multiple seasonal components, you should try TBATS. As Xiaoxi Wu pointe out, FB Prophet can model multiple seasonalities, and Google's BSTS package can as well.

Another thing, does it make any difference I take the whole last year or just a part of it to use in the Holt-Winters method?

Yes it does. I you want to model a seasonality, then you need at least two times the seasonal period to be able to model it (preferably more), otherwise your model has no way of knowing whether a spike is a seasonal variation or just a one time impulse. So for example to model a weekly seasonality, you need at least 14 days of training data (plus whatever you will use for testing, and for a yearly seasonality, you will need at least 730 days of data, etc....

Alex Kinman
  • 2,437
  • 8
  • 32
  • 51
1

Looks like you have daily data and you would like to forecast for the next three months. The question here is do you need daily forecasting or weekly forecasting or just monthly forecasting? I guess you will probably need daily or weekly forecast. If you need weekly forecast, it might be easier to group the data first by week and then run forecast.

A very good tool to use for daily data is the Facebook's new Prophet package. It will work with dataframe instead of ts project, which makes it so much easier to handle with. And you can quickly get daily (if you have hourly data or so), weekly and monthly seasonality from some build-in function, like plot_components. Here is a quick start tutorial by Facebook. They have API for both Python and R.

Here are some quick code to plot the weekly and monthly seasonality (is there is any) with Prophet.

library(prophet)
library(dplyr)
df <- FID_DataSet %>% rename(ds = date, y = Value)
m <- prophet(df)
future <- make_future_dataframe(m, periods = 365)
forecast <- predict(m, future)
plot(m, forecast) # plot out the forecast
prophet_plot_components(m, forecast) # plot out the components: trend, weekly and yearly seasonality if there is any.
Xiaoxi Wu
  • 11
  • 3
  • Hi @xiaoxi-wu thanks for your help. I want to forecast by day, but some of data has variations during the week day, for example, for some types, every monday is bigger then the other days, but this is not a rule for all. I want to know if there is a way to see if the data has season or don't and which is the cycle. For data without season, which is the best approach. Thanks in advance. – Evan Bessa Oct 23 '18 at 07:12
  • @XiaoxiWu I did the forecast using prophet library on R but i have a problem with the trend. how can i ignore the trend or set? i want to reduce the power of the trend. Otherwise my results its going to negative numbers and i can't have this. Thanks! – Evan Bessa Oct 23 '18 at 07:42
  • @EvanBessa You can try to use logistic as trend instead of linear, like prophet(df, growth = 'logistic'), and then you can set the cap and floor to make sure it does go to negative. – Xiaoxi Wu Oct 26 '18 at 17:36
  • @ArtjomB. I attached some sample code showing how to plot out the seasonality components. – Xiaoxi Wu Oct 26 '18 at 18:34