0

I am trying to apply Holt-Winters using the ETS. I am reading the data from a DB as the start timestamp is likely to differ (but the interval remains at 15 min) for different users.

I am having problems in plotting/interpreting the forecast results. The x-axis probably displaying the index values of the time series. I am not able to identify the problem. A sample data is below:

> rawdata
    date_time_start total_transmitted_mbps
    25/04/2017 00:00    8091.22258
    25/04/2017 00:15    8669.16705
    25/04/2017 00:30    6742.03133
    25/04/2017 00:45    7637.89432
    25/04/2017 01:00    7190.45344
    25/04/2017 01:15    9798.56278
    25/04/2017 01:30    7136.48579
    25/04/2017 01:45    6255.34125
    25/04/2017 02:00    6315.19628
    25/04/2017 02:15    6306.36521
    25/04/2017 02:30    9749.50128
    25/04/2017 02:45    8247.23815
    25/04/2017 03:00    9629.79122
    25/04/2017 03:15    9316.77885
    25/04/2017 03:30    9877.06118
    25/04/2017 03:45    8909.5684
    25/04/2017 04:00    7853.76492
    25/04/2017 04:15    8877.18781
    25/04/2017 04:30    6856.83524
    25/04/2017 04:45    9037.1283

Formatting the time series to retain the input time format:

raw_data$date_time_start <- 
  as.POSIXct(strptime(paste(as.character(raw_data$date_time_start),":00",sep = ""),
                      format="%d/%m/%Y %H:%M:%S"))
eventdata <- xts(raw_data$total_cir_transmitted_mbps,
                order.by = raw_data$date_time_start)
plot(eventdata) # plot is OK

The plot of this input is OK.enter image description here

I am using the ets as follows:

    fit2<-ets(eventdata, model="ZZZ", damped=TRUE, alpha=NULL, beta=NULL, gamma=NULL)    
fcast90 <- forecast(fit2, h=100)
    plot(fcast100) # x-axis of plot is incorrect

enter image description here

I notice that when I fcast90$x I am able to see an output. The timestamp for the next 100 periods in the forecast are not included in the output?

 > fcast90$x
    Time Series:
    Start = 1 
    End = 11521 
    Frequency = 0.0166666666666667 
      [1]  8091.223  8669.167  6742.031  7637.894  7190.453  9798.563  7136.486  6255.341  6315.196
[10]  6306.365  9749.501  8247.238  9629.791  9316.779  9877.061  8909.568  7853.765  8877.188

How do I forecast and view the next 100 days?

Update Based on @A5C1D2H2I1M1N2O1R2T1 and @joran posts, I tried two things:

  1. Generate the sequence of dates (format: YYYY-MM-DD)

  2. Set axes = FALSE in the plot, and labelling the axes on our own.

I am unable to get the #2 working

With #1, in my data, the start date shall be different among the users. In order to try the suggestion by @A5C1D2H2I1M1N2O1R2T1, I assumed that start date is fixed. I read in the first date and last for that user to obtain the frequency.

aa <- raw_data[1,] # to obtain the start date
bb <- raw_data[nrow(raw_data),] # to obtain the last date using the nrow

Since the start/end time for each user may be different, I am calculating the number of days in the time series. The time_diff days should equal to the forecast data points fcast90 <- forecast(fit2, fcast_days+time_diff).

fcast_days = 100 
startDate = as.POSIXct(strptime(paste(as.character(aa$date_time_start),":00",sep = ""),  format="%d/%m/%Y %H:%M:%S"))
endDate = as.POSIXct(strptime(paste(as.character(bb$date_time_start),":00",sep = ""), format="%d/%m/%Y %H:%M:%S")) 
time_diff = as.numeric(round(endDate - startDate)) # output=16

Generating a sequence for the plot labels

a = seq(as.Date(startDate), by="days", length=time_diff+fcast_days) #length = 116

But I hit a problem when I use seq because the lowest granularity for seq is in days. My time series in 15 mins interval. So I am forced to read in the data as opposed to generate it. For this reason, I used raw_data$date_time_start <- as.POSIXct(strptime(paste(as.character(raw_data$date_time_start),":00",sep = ""),format="%d/%m/%Y %H:%M:%S")). Please inform if this is wrong.

With #2, I set axes = FALSE to print only the date. Re-using the code from the link:

fcast90 <- forecast(fit2, fcast_days+time_diff)
plot(fcast90, axes = FALSE)
axis(1, at = a, labels = format(a, "%d %b %Y"), cex.axis=0.6)
abline(v = decimal_date(a), col='grey', lwd=0.5)
axis(2, cex.axis=0.6)

I think the problem in the plot is due to the miss-match in the number of days in the seq, data points in fcast90$x.

> length(fcast90$x) # represents data captured at 15 min interval
[1] 1536
> length(a) # repesents number of days
[1] 116

For the time series I have, are my steps correct?

Krantz
  • 1,424
  • 1
  • 12
  • 31
mamat.mj
  • 23
  • 6
  • I think the same is discussed here: https://stackoverflow.com/questions/10302261/forecasting-time-series-data – Adam Oct 13 '17 at 08:06
  • Thank you very much for pointing to the prior post. It did not show up in my search earlier. I have updated my post based on the link you shared. I am still unable to successfully plot the output – mamat.mj Oct 14 '17 at 16:36

1 Answers1

0

Check forecast documentation.

fcast90$mean, fcast90$lower or fcast90$higher should give you what you are looking for.

AshOfFire
  • 676
  • 5
  • 15
  • Indeed `fcast90$x` shows output out my `forecast(fit2, fcast_days+time_diff)`. But the number of data points do not match the number of days the I had generated to alter the plot axes. – mamat.mj Oct 14 '17 at 16:39