0

Currently, we're looking at predictions for students with over 120 units (which would allow them to graduate). Below is our current dataset we're working on.

structure(list(Term = structure(c(5L, 9L, 1L, 6L, 10L, 2L, 7L, 
11L, 3L, 8L, 12L, 4L), .Label = c("F - 2014", "F - 2015", "F - 2016", 
"F - 2017", "S - 2014", "S - 2015", "S - 2016", "S - 2017", "Sp - 2014", 
"Sp - 2015", "Sp - 2016", "Sp - 2017"), class = "factor"), Bachelors = c(182L, 
1103L, 496L, 177L, 1236L, 511L, 161L, 1264L, 544L, 150L, 1479L, 
607L), Masters = c(33L, 144L, 35L, 22L, 175L, 55L, 57L, 114L, 
66L, 52L, 147L, 50L), Seniors = c(577L, 2485L, 2339L, 604L, 2660L, 
2474L, 545L, 2628L, 2594L, 712L, 2807L, 2546L), Over.120 = structure(c(235L, 
1746L, 1188L, 235L, 1837L, 1192L, 200L, 1883L, 1217L, 255L, 2002L, 
1245L), .Tsp = c(2014, 2017.66666666667, 3), class = "ts")), row.names = c(NA, 
-12L), class = "data.frame")

We're wanting to use ARIMA forecasting———looking at 3 different periods throughout a year: Spring, Summer, Fall - from 2014 through 2017———with this were looking to see what the trend will look like for the next 6 years (2018 to 2023)

data <- read.csv("Graduation3.csv")
str(data)
library(forecast)

data$Over.120 <- ts(data$Over.120, start=c(2014,1), end=c(2017,3), frequency = 3)
summary(data)

dOver120 <- diff(data$Over.120)
dOver120 <- diff(data$Over.120,3)

plot(dOver120)

fit_diff_ar <- arima(dOver120, order=c(3,0,0))
summary(fit_diff_ar)

fit_diff_arf <- forecast(fit_diff_ar,h=18)
print(fit_diff_arf)
plot(fit_diff_arf,include=12)

plot of ARIMA forecast (sidenote: I don't have enough rep to directly post an image)

We expected the conditional exception line of the forecast plot to follow the same type trend as in the previous years (zig zaging) however as the years progress the line begins to flatline around the mean. Currently were stuck on this, and not sure if its something in the code or this is simply how the trend is supposed to happen.

J. Flink
  • 19
  • 5

2 Answers2

2

The model ARIMA(3,0,0) has 3 autoregressive coefficients so it is only going to look at the last three values of the series when predicting the next value. In this case, presumably the fitted coefficients have a dampening effect slowly shrinking the predicted value. As the model is extrapolated out each 3 values that it is using to predict the next continue to be dampened more.

If you look at the coefficients from summary(fit_diff_ar) you can manually calculate each forecast value and will understand the results better.

Try fit_diff_ar <- auto.arima(dOver120) and see how the coefficients differ from the model you estimated. This may forecast values which continue to fluctuate.

AidanGawronski
  • 2,055
  • 1
  • 14
  • 24
  • When we tried this, we got the error "Warning message: In value[[3L]](cond) : The chosen test encountered an error, so no seasonal differencing is selected. Check the time series data." – J. Flink Jan 03 '19 at 04:07
  • If you make your example reproducible I might be able to help further: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – AidanGawronski Jan 03 '19 at 04:33
  • Edited the post to add the dataset were working on, @AidanGawronski – J. Flink Jan 03 '19 at 17:50
  • A couple thoughts: You have two entries for S - 2017, and your data is not in chronological order. Can you dput your data please. – AidanGawronski Jan 03 '19 at 19:13
  • I added dput to the post @AidanGawronski – J. Flink Jan 03 '19 at 20:51
0
data <- structure(list(Term = structure(c(5L, 9L, 1L, 6L, 10L, 2L, 7L, 11L, 3L, 8L, 12L, 4L), 
                        .Label = c("F - 2014", "F - 2015", "F - 2016", "F - 2017", "S - 2014", "S - 2015", "S - 2016", "S - 2017", "Sp - 2014", "Sp - 2015", "Sp - 2016", "Sp - 2017"), class = "factor"), 
                       Bachelors = c(182L, 1103L, 496L, 177L, 1236L, 511L, 161L, 1264L, 544L, 150L, 1479L, 607L), 
                       Masters = c(33L, 144L, 35L, 22L, 175L, 55L, 57L, 114L, 66L, 52L, 147L, 50L), 
                       Seniors = c(577L, 2485L, 2339L, 604L, 2660L, 2474L, 545L, 2628L, 2594L, 712L, 2807L, 2546L), 
                       Over.120 = structure(c(235L, 1746L, 1188L, 235L, 1837L, 1192L, 200L, 1883L, 1217L, 255L, 2002L, 1245L), 
                      .Tsp = c(2014, 2017.66666666667, 3), class = "ts")), 
                  row.names = c(NA, -12L), class = "data.frame")

data$Term <- as.character(data$Term)
data$year <- as.numeric(gsub(".* - (.*)", "\\1",  data$Term))

# Create a numeric variable to represent the term
data$Term2 <- NA
# make spring 1
data$Term2 <- ifelse(grepl("Sp -", data$Term), 1, data$Term2)
# make summer 2
data$Term2 <- ifelse(grepl("S -", data$Term), 2, data$Term2)
# make fall 3
data$Term2 <- ifelse(grepl("F -", data$Term), 3, data$Term2)
# order the data
data <- data[order(data$year, data$Term2),]

library(forecast)
# still using your same original model
fit <- Arima(data$Over.120, order=c(3,0,0))
summary(fit)
# Series: data$Over.120 
# ARIMA(3,0,0) with non-zero mean 
# 
# Coefficients:
#   ar1      ar2     ar3       mean
# -0.0693  -0.0947  0.9151  1113.3012
# s.e.   0.1126   0.1106  0.1117    39.4385
# 
# sigma^2 estimated as 4573:  log likelihood=-70.94
# AIC=151.87   AICc=161.87   BIC=154.3
# 
# Training set error measures:
#   ME     RMSE      MAE       MPE   MAPE       MASE       ACF1
# Training set 20.15158 55.21532 50.34405 -1.440427 8.5131 0.04400354 0.04719142

preds <- forecast(fit, h = 18)
preds
# Point Forecast      Lo 80     Hi 80       Lo 95     Hi 95
# 13      1998.6938 1912.02930 2085.3583 1866.151880 2131.2357
# 14       254.0220  167.14935  340.8946  121.161754  386.8822
# 15      1209.5318 1122.31031 1296.7532 1076.138065 1342.9255
# 16      1998.2207 1879.58698 2116.8543 1816.786098 2179.6552
# 17       256.5197  137.43643  375.6029   74.397577  438.6418
# 18      1176.9475 1057.04047 1296.8545  993.565520 1360.3295
# 19      1999.8107 1858.09461 2141.5268 1783.074649 2216.5467
# 20       261.7816  119.43633  404.1268   44.083310  479.4798
# 21      1146.6151 1002.99843 1290.2318  926.972345 1366.2579
# 22      2002.8707 1842.34640 2163.3949 1757.369987 2248.3713
# 23       269.2577  107.99642  430.5189   22.629865  515.8855
# 24      1118.0506  955.13282 1280.9684  868.889357 1367.2118
# 25      2006.9434 1830.13885 2183.7480 1736.544176 2277.3426
# 26       278.5222  100.93452  456.1100    6.925256  550.1192
# 27      1090.8838  911.32145 1270.4462  816.266885 1365.5007
# 28      2011.6766 1820.27506 2203.0781 1718.953221 2304.3999
# 29       289.2452   97.06116  481.4292   -4.674897  583.1652
# 30      1064.8324  870.41604 1259.2487  767.498250 1362.1665

plot(preds)

your forecast

AidanGawronski
  • 2,055
  • 1
  • 14
  • 24