9

Is there a single function, similar to "runif", "rnorm" and the like which will produce simulated predictions for a linear model? I can code it on my own, but the code is ugly and I assume that this is something someone has done before.

slope = 1.5
intercept = 0
x = as.numeric(1:10)
e = rnorm(10, mean=0, sd = 1)
y = slope * x + intercept + e
fit = lm(y ~ x, data = df)
newX = data.frame(x = as.numeric(11:15))

What I'm interested in is a function that looks like the line below:

sims = rlm(1000, fit, newX)

That function would return 1000 simulations of y values, based on the new x variables.

PirateGrunt
  • 404
  • 4
  • 11
  • 1
    The last line in your Q has me confused. `x` is fixed; do you mean simulate `y` (the response) for the new `x` data? – Gavin Simpson Feb 19 '13 at 21:37
  • Sorry, Gavin, you're correct. I meant to say that the responses would be simulated. This has been edited. – PirateGrunt Feb 19 '13 at 21:39
  • 2
    OK, So you could look at `?simulate` but that only works for the current `x`. But you could alter it (`simulate.lm()`) to call `predict()` on the model object with `newdata = newX` instead of the current call to `fitted()` and then allow it to proceed as with the normal code. Assuming `weights` was not used as that would complicate matters... – Gavin Simpson Feb 19 '13 at 21:41
  • 2
    I see that this just came from stats.stackexchange. This question as it is written belongs here. But I think it would be more interesting if it was asked over there, in a more general way, without the programming specifics. Just my two cents. – Seth Feb 19 '13 at 21:52

1 Answers1

12

Showing that Gavin Simpson's suggestion of modifying stats:::simulate.lm is a viable one.

## Modify stats:::simulate.lm by inserting some tracing code immediately
## following the line that reads "ftd <- fitted(object)" 
trace(what = stats:::simulate.lm,
      tracer = quote(ftd <- list(...)[["XX"]]),
      at = list(6))

## Prepare the data and 'fit' object 
df <- data.frame(x =x<-1:10, y = 1.5*x + rnorm(length(x)))
fit <- lm(y ~ x, data = df)

## Define new covariate values and compute their predicted/fitted values
newX <- 8:1
newFitted <- predict(fit, newdata = data.frame(x = newX))

## Pass in fitted via the argument 'XX'
simulate(fit, nsim = 4, XX = newFitted)
#        sim_1     sim_2       sim_3     sim_4
# 1 11.0910257 11.018211 10.95988582 13.398902
# 2 12.3802903 10.589807 10.54324607 11.728212
# 3  8.0546746  9.925670  8.14115433  9.039556
# 4  6.4511230  8.136040  7.59675948  7.892622
# 5  6.2333459  3.131931  5.63671024  7.645412
# 6  3.7449859  4.686575  3.45079655  5.324567
# 7  2.9204519  3.417646  2.05988078  4.453807
# 8 -0.5781599 -1.799643 -0.06848592  0.926204

That works, but this is a cleaner (and likely better) approach:

## A function for simulating at new x-values
simulateX <- function(object, nsim = 1, seed = NULL, X, ...) {
    object$fitted.values <- predict(object, X)
    simulate(object = object, nsim = nsim, seed = seed, ...)
}
    
## Prepare example data and a fit object
df <- data.frame(x =x<-1:10, y = 1.5*x + rnorm(length(x)))
fit <- lm(y ~ x, data = df)

## Supply new x-values in a data.frame of the form expected by
## the newdata= argument of predict.lm()
newX <- data.frame(x = 8:1)
   
## Try it out
simulateX(fit,  nsim = 4, X = newX)
#       sim_1     sim_2     sim_3     sim_4
# 1 11.485024 11.901787 10.483908 10.818793
# 2 10.990132 11.053870  9.181760 10.599413
# 3  7.899568  9.495389 10.097445  8.544523
# 4  8.259909  7.195572  6.882878  7.580064
# 5  5.542428  6.574177  4.986223  6.289376
# 6  5.622131  6.341748  4.929637  4.545572
# 7  3.277023  2.868446  4.119017  2.609147
# 8  1.296182  1.607852  1.999305  2.598428
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • That's fantastic. The second solution is nice and clean. The first one uses some debugging techniques I'd not learned yet. Excellent. Thank you. – PirateGrunt Feb 20 '13 at 13:51
  • @PirateGrunt -- Glad you appreciated the `trace()` example -- it's a true powertool for R development and code exploration. If you'll use it much, you might appreciate [the answers to this quesion](http://stackoverflow.com/questions/11319161/what-is-a-fast-way-to-set-debugging-code-at-a-given-line-in-a-function), which make it much easier to find the value of `at=` corresponding to the desired code insertion point. (I often use `print.func()` from Michael Hoffman's answer. Try it, e.g., with `print.func(stats:::simulate.lm)`). Enjoy! – Josh O'Brien Feb 20 '13 at 14:11
  • How would this function look like with multiple predictors? – ego_ Apr 11 '18 at 07:06
  • 2
    I know this is seven years old (StackOverflow is crazy), but this is wrong. You are providing new covariate data to the simulateX function, but what you need to provide are new response. This will work if in you replace `X = newX` with `X = predict(fit, new_data = newX)` in the simulateX function call. – Dalton Hance Sep 24 '20 at 01:16
  • @DaltonHance Right you are. Have edited both parts of my answer to correct the error. Thanks! – Josh O'Brien Sep 24 '20 at 23:14