0

With every iteration of the loop, I'd like to fit a linear model using more historical data and see how, for example, the one-step ahead prediction compares to the actual. The code should be self-explanatory. The problem seems to be that Dependent and Independent are fixed in size after the first iteration (which I'd like to start at 10 data points, as shown in the code), whereas I'd like them to be dynamically sized.

output1 <- rep(0, 127)
output2 <- rep(0, 127)
ret <- function(x, y)
{
  for (i in 1:127)
  {
    Dependent <- y[1:(9+i)]
    Independent <- x[1:(9+i)]
    fit <- lm(Dependent ~ Independent)
    nextInput <- data.frame(Independent = x[(10+i)])
    prediction <- predict(fit, nextInput, interval="prediction")
    output1[i] <- prediction[2]
    output2[i] <- prediction[3]
  }
}
rocketman
  • 49
  • 4
  • Can you provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? I suggest starting with a standard dataset or a reproducible vector of random numbers. – r2evans Aug 29 '16 at 22:38
  • 2
    Also, I suspect you want `1:(9+i)` instead ... run (just) `1:9+5` to verify. – r2evans Aug 29 '16 at 22:39
  • @r2evans, that should solve the problem. Currently, the function just fits a linear model on fixed width rolling windows: 1:9, 2:10, 3:11, etc. (hence, the fixed size window.). – jav Aug 29 '16 at 22:47
  • Gotcha, the answer should be easily updated to account for a sliding window instead of fixed-start. – r2evans Aug 29 '16 at 22:50
  • Weird. I've added the parentheses and now the code works if I change "i" manually and run it. But no information seems to be stored... I've updated the code to reflect my changes – rocketman Aug 29 '16 at 23:00
  • 1
    You have scope issues. It's considered "bad practice" to do it the way you are doing. If you want to stick with this method, you need to define the vectors *inside* the function and return them (perhaps as a list). If you *must* keep it as is, then read up on the difference between `<-` and `<<-` (which, again, is IMHO a bad idea in general and in this situation). – r2evans Aug 29 '16 at 23:22
  • Ok, I figured it was some kind of scope issue. Thanks for the help! – rocketman Aug 29 '16 at 23:24

1 Answers1

1

Here's a thought, let me know if I'm close to your intent:

set.seed(42)
n <- 100
x <- rnorm(n)
head(x)
# [1]  1.3709584 -0.5646982  0.3631284  0.6328626  0.4042683 -0.1061245
y <- runif(n)
head(y)
# [1] 0.8851177 0.5171111 0.8519310 0.4427963 0.1578801 0.4423246

ret <- lapply(10:n, function(i) {
  dep <- y[1:i]
  indep <- x[1:i]
  fit <- lm(dep ~ indep)
  pred <- 
    if (i < n) {
      predict(fit, data.frame(indep = x[i+1L]), interval = "prediction")
    } else NULL
  list(fit = fit, pred = pred)
})

Note that I'm making a list of models/predictions instead of using a for loop. Though not exactly the same, this answer does a decent job explaining why this may be a good idea.

Model and prediction from one of the runs:

ret[[50]]
# $fit
# Call:
# lm(formula = dep ~ indep)
# Coefficients:
# (Intercept)        indep  
#     0.44522      0.02691  
# $pred
#         fit        lwr      upr
# 1 0.4528911 -0.1160787 1.021861
summary(ret[[50]]$fit)
# Call:
# lm(formula = dep ~ indep)
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -0.42619 -0.22178 -0.00004  0.15550  0.53774 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  0.44522    0.03667  12.141   <2e-16 ***
# indep        0.02691    0.03186   0.845    0.402    
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual standard error: 0.2816 on 57 degrees of freedom
# Multiple R-squared:  0.01236, Adjusted R-squared:  -0.004966 
# F-statistic: 0.7134 on 1 and 57 DF,  p-value: 0.4018
Community
  • 1
  • 1
r2evans
  • 141,215
  • 6
  • 77
  • 149