R: Loop structure to use dynamically sized arrays to build linear models

Question

With every iteration of the loop, I'd like to fit a linear model using more historical data and see how, for example, the one-step ahead prediction compares to the actual. The code should be self-explanatory. The problem seems to be that Dependent and Independent are fixed in size after the first iteration (which I'd like to start at 10 data points, as shown in the code), whereas I'd like them to be dynamically sized.

output1 <- rep(0, 127)
output2 <- rep(0, 127)
ret <- function(x, y)
{
  for (i in 1:127)
  {
    Dependent <- y[1:(9+i)]
    Independent <- x[1:(9+i)]
    fit <- lm(Dependent ~ Independent)
    nextInput <- data.frame(Independent = x[(10+i)])
    prediction <- predict(fit, nextInput, interval="prediction")
    output1[i] <- prediction[2]
    output2[i] <- prediction[3]
  }
}

Can you provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? I suggest starting with a standard dataset or a reproducible vector of random numbers. — r2evans, Aug 29 '16 at 22:38
Also, I suspect you want `1:(9+i)` instead ... run (just) `1:9+5` to verify. — r2evans, Aug 29 '16 at 22:39
@r2evans, that should solve the problem. Currently, the function just fits a linear model on fixed width rolling windows: 1:9, 2:10, 3:11, etc. (hence, the fixed size window.). — jav, Aug 29 '16 at 22:47
Gotcha, the answer should be easily updated to account for a sliding window instead of fixed-start. — r2evans, Aug 29 '16 at 22:50
Weird. I've added the parentheses and now the code works if I change "i" manually and run it. But no information seems to be stored... I've updated the code to reflect my changes — rocketman, Aug 29 '16 at 23:00
You have scope issues. It's considered "bad practice" to do it the way you are doing. If you want to stick with this method, you need to define the vectors *inside* the function and return them (perhaps as a list). If you *must* keep it as is, then read up on the difference between `<-` and `<<-` (which, again, is IMHO a bad idea in general and in this situation). — r2evans, Aug 29 '16 at 23:22
Ok, I figured it was some kind of scope issue. Thanks for the help! — rocketman, Aug 29 '16 at 23:24

score 1 · Accepted Answer · edited May 23 '17 at 10:33

Here's a thought, let me know if I'm close to your intent:

set.seed(42)
n <- 100
x <- rnorm(n)
head(x)
# [1]  1.3709584 -0.5646982  0.3631284  0.6328626  0.4042683 -0.1061245
y <- runif(n)
head(y)
# [1] 0.8851177 0.5171111 0.8519310 0.4427963 0.1578801 0.4423246

ret <- lapply(10:n, function(i) {
  dep <- y[1:i]
  indep <- x[1:i]
  fit <- lm(dep ~ indep)
  pred <- 
    if (i < n) {
      predict(fit, data.frame(indep = x[i+1L]), interval = "prediction")
    } else NULL
  list(fit = fit, pred = pred)
})

Note that I'm making a list of models/predictions instead of using a for loop. Though not exactly the same, this answer does a decent job explaining why this may be a good idea.

Model and prediction from one of the runs:

ret[[50]]
# $fit
# Call:
# lm(formula = dep ~ indep)
# Coefficients:
# (Intercept)        indep  
#     0.44522      0.02691  
# $pred
#         fit        lwr      upr
# 1 0.4528911 -0.1160787 1.021861
summary(ret[[50]]$fit)
# Call:
# lm(formula = dep ~ indep)
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -0.42619 -0.22178 -0.00004  0.15550  0.53774 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  0.44522    0.03667  12.141   <2e-16 ***
# indep        0.02691    0.03186   0.845    0.402    
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual standard error: 0.2816 on 57 degrees of freedom
# Multiple R-squared:  0.01236, Adjusted R-squared:  -0.004966 
# F-statistic: 0.7134 on 1 and 57 DF,  p-value: 0.4018

Thanks, could you see my above comment? – rocketman Aug 29 '16 at 23:14 — rocketman, Aug 29 '16 at 23:14

R: Loop structure to use dynamically sized arrays to build linear models

1 Answers1