I read in a csv file into a data frame using:
dataxlsx <- read.csv(file.choose(), header = T)
The data frame looks like:
Year Month Period X410 X430 X431
2005 1 1 3467748 4434879 1345638
2005 2 2 3626338 4311150 1167523
. . . . . .
2015 7 127 2374105 1514540 1399804
I am trying to run my function that I created called HWplot to forecast the inputted data and run the forecast, as well as output a plot for the forecast.
I used packages ggplot2, tseries, forecast.
HWplot <- function(dataxlsx, n.ahead=12, CI=.95, error.ribbon='green', line.size=1) {
hw_object<-HoltWinters(dataxlsx)
forecast<-predict(hw_object, n.ahead=24, prediction.interval=T, level=0.95)
for_values<-data.frame(time=round(time(forecast), 3), value_forecast=as.data.frame(forecast)$fit, dev=as.data.frame(forecast)$upr-as.data.frame(forecast)$fit)
fitted_values<-data.frame(time=round(time(hw_object$fitted), 3), value_fitted=as.data.frame(hw_object$fitted)$xhat)
actual_values<-data.frame(time=round(time(hw_object$x), 3), Actual=c(hw_object$x))
graphset<-merge(actual_values, fitted_values, by='time', all=TRUE)
graphset<-merge(graphset, for_values, all=TRUE, by='time')
graphset[is.na(graphset$dev), ]$dev<-0
graphset$Fitted<-c(rep(NA, NROW(graphset)-(NROW(for_values) + NROW(fitted_values))), fitted_values$value_fitted, for_values$value_forecast)
graphset.melt<-melt(graphset[, c('time', 'Actual', 'Fitted')], id='time')
p<-ggplot(graphset.melt, aes(x=time, y=value)) + geom_ribbon(data=graphset, aes(x=time, y=Fitted, ymin=Fitted-dev, ymax=Fitted + dev), alpha=.2, fill=error.ribbon) + geom_line(aes(colour=variable), size=line.size) + geom_vline(x=max(actual_values$time), lty=2) + xlab('Time') + ylab('Value') + theme(legend.position='bottom') + scale_colour_hue('')
return(p)
}
The problem I am facing is that I cannot split this data frame in order to apply the HWplot function to the separate columns of the data (forecast X410, X430, X431, etc). I will be working with data frames that have a different number of X### codes in the columns, so I will need the R Script to incorporate a dynamic number of columns as well.
The end game is to run these forecasts from the different columns of the data frame and output the forecast and graph to an excel workbook with the name of the column as each sheet name.
Side note: The HWplot function works when there is a data frame with only one column of metrics, however is not working with multiple columns of metrics.
Everything I have tried with the apply family of functions isn't working nor are the split functions.
Hope this makes sense - if anyone needs clarification, please let me know.