I am trying to apply Holt-Winters using the ETS. I am reading the data from a DB as the start timestamp is likely to differ (but the interval remains at 15 min) for different users.
I am having problems in plotting/interpreting the forecast results. The x-axis probably displaying the index values of the time series. I am not able to identify the problem. A sample data is below:
> rawdata
date_time_start total_transmitted_mbps
25/04/2017 00:00 8091.22258
25/04/2017 00:15 8669.16705
25/04/2017 00:30 6742.03133
25/04/2017 00:45 7637.89432
25/04/2017 01:00 7190.45344
25/04/2017 01:15 9798.56278
25/04/2017 01:30 7136.48579
25/04/2017 01:45 6255.34125
25/04/2017 02:00 6315.19628
25/04/2017 02:15 6306.36521
25/04/2017 02:30 9749.50128
25/04/2017 02:45 8247.23815
25/04/2017 03:00 9629.79122
25/04/2017 03:15 9316.77885
25/04/2017 03:30 9877.06118
25/04/2017 03:45 8909.5684
25/04/2017 04:00 7853.76492
25/04/2017 04:15 8877.18781
25/04/2017 04:30 6856.83524
25/04/2017 04:45 9037.1283
Formatting the time series to retain the input time format:
raw_data$date_time_start <-
as.POSIXct(strptime(paste(as.character(raw_data$date_time_start),":00",sep = ""),
format="%d/%m/%Y %H:%M:%S"))
eventdata <- xts(raw_data$total_cir_transmitted_mbps,
order.by = raw_data$date_time_start)
plot(eventdata) # plot is OK
The plot of this input is OK.enter image description here
I am using the ets
as follows:
fit2<-ets(eventdata, model="ZZZ", damped=TRUE, alpha=NULL, beta=NULL, gamma=NULL)
fcast90 <- forecast(fit2, h=100)
plot(fcast100) # x-axis of plot is incorrect
I notice that when I fcast90$x
I am able to see an output. The timestamp for the next 100 periods in the forecast are not included in the output?
> fcast90$x
Time Series:
Start = 1
End = 11521
Frequency = 0.0166666666666667
[1] 8091.223 8669.167 6742.031 7637.894 7190.453 9798.563 7136.486 6255.341 6315.196
[10] 6306.365 9749.501 8247.238 9629.791 9316.779 9877.061 8909.568 7853.765 8877.188
How do I forecast and view the next 100 days?
Update Based on @A5C1D2H2I1M1N2O1R2T1 and @joran posts, I tried two things:
Generate the sequence of dates (format: YYYY-MM-DD)
Set
axes = FALSE
in the plot, and labelling the axes on our own.
I am unable to get the #2 working
With #1, in my data, the start date shall be different among the users. In order to try the suggestion by @A5C1D2H2I1M1N2O1R2T1, I assumed that start date is fixed. I read in the first date and last for that user to obtain the frequency.
aa <- raw_data[1,] # to obtain the start date
bb <- raw_data[nrow(raw_data),] # to obtain the last date using the nrow
Since the start/end time for each user may be different, I am calculating the number of days in the time series. The time_diff
days should equal to the forecast data points fcast90 <- forecast(fit2, fcast_days+time_diff)
.
fcast_days = 100
startDate = as.POSIXct(strptime(paste(as.character(aa$date_time_start),":00",sep = ""), format="%d/%m/%Y %H:%M:%S"))
endDate = as.POSIXct(strptime(paste(as.character(bb$date_time_start),":00",sep = ""), format="%d/%m/%Y %H:%M:%S"))
time_diff = as.numeric(round(endDate - startDate)) # output=16
Generating a sequence for the plot labels
a = seq(as.Date(startDate), by="days", length=time_diff+fcast_days) #length = 116
But I hit a problem when I use seq
because the lowest granularity for seq
is in days
. My time series in 15 mins interval. So I am forced to read in the data as opposed to generate it. For this reason, I used raw_data$date_time_start <- as.POSIXct(strptime(paste(as.character(raw_data$date_time_start),":00",sep = ""),format="%d/%m/%Y %H:%M:%S"))
. Please inform if this is wrong.
With #2, I set axes = FALSE
to print only the date. Re-using the code from the link:
fcast90 <- forecast(fit2, fcast_days+time_diff)
plot(fcast90, axes = FALSE)
axis(1, at = a, labels = format(a, "%d %b %Y"), cex.axis=0.6)
abline(v = decimal_date(a), col='grey', lwd=0.5)
axis(2, cex.axis=0.6)
I think the problem in the plot is due to the miss-match in the number of days in the seq
, data points in fcast90$x
.
> length(fcast90$x) # represents data captured at 15 min interval
[1] 1536
> length(a) # repesents number of days
[1] 116
For the time series I have, are my steps correct?