How to use the predict function for creating the confidence bands in R?

Question

This is what I have done so far. I am not sure how to create the 95% confidence band:

x=rnorm(100,0,1)
e=rnorm(100,0,4)
for (i in 1:100){y[i]=2+3*x[i]+e[i]}
plot(x,y,lty=3)
estimation_lm=lm(y~x)
(summary(estimation_lm))
(cc=coef(estimation_lm))
abline(estimation_lm)
abline(a=2, b=3,col="red")

enter image description here

I know I have to use this code but I am not exactly sure what I should use in the new data or for interval(I guess I should use prediction) for this question:

predict(object, newdata, interval = "none"/"confidence"/"prediction",level = 0.95)

A more zoomed-in version of the part I am stuck in: enter image description here

Some time ago I wrote a little something on how to plot intervals in R. http://rpubs.com/RomanL/7024 — Roman Luštrik, Feb 16 '14 at 08:12
@RomanLuštrik That's such an informative blog. My problem is that from the question I don't get what the new data is to insert in the predict formula. Thanks! — Mona Jalal, Feb 16 '14 at 08:20
You can't insert formulas in predict, but you can calculate things by hand. You can get `\hat{y}` from `predict`, `x` are your raw values and you can get `t` with functions `qt`. See `?TDist`. — Roman Luštrik, Feb 16 '14 at 08:24
Can you narrow down your question and be more specific where you get stuck? It would help (yourself and us!) if you could provide a small, reproducible example (tips on how to do that here http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). — Roman Luštrik, Feb 16 '14 at 21:58

score 0 · Answer 1 · answered Feb 17 '14 at 13:12

0

The s_yhat formula indicates that the interval requested is for the mean, not for individual data. In this case, the correct parameter to be used in predict function is interval="confidence". See below:

library(gplots) # plotCI

data = data.frame(matrix(0, nrow=100, ncol=2))
colnames(data) = c("x", "y")
data$x = rnorm(100,0,1)
e = rnorm(100,0,4)
for (i in 1:100) {
  data$y[i] = 2 + 3*data$x[i] + e[i]
}
plot(data$x, data$y, xlab="x", ylab="y", pch=20)
estimation_lm = lm(y~x, data)
(summary(estimation_lm))
(coef(estimation_lm))
abline(estimation_lm)
abline(a=2, b=3, col="red", lty="dotted")
predict = predict(estimation_lm, data, interval="confidence", level=0.95)
plotCI(data$x, predict[,1], li=predict[,2], ui=predict[,3], add=T, col="blue", gap=0, pch=NA_integer_)
legend("bottomright", legend=c("estimated regression", "true line", "confidence interval 95%"), lty=c("solid", "dashed", "solid"), col=c("black", "red", "blue"))

enter image description here

answered Feb 17 '14 at 13:12

Baumann

1,119
11
20

Sounds like you are not using the student distribution for the new data?! – Mona Jalal Feb 17 '14 at 21:45
`predict` function does use the student distribution to calculate the confidence interval. Take a look at lines 146 and 147 of the `predict.lm` source code. – Baumann Feb 18 '14 at 12:48
why do you have this line of code? `data = data.frame(matrix(0, nrow=100, ncol=2))` I mean which part of the question depicts that you should have that line of the code? – Mona Jalal Feb 19 '14 at 02:32
I've created a `data.frame` to store the `x` and `y`. The reason is that the `newdata` parameter from the `predict` function expects a `data.frame` object. If you still want to work with `x` and `y` as separated vectors, you need to do 2 things: (1) initialize `y` before the loop: `y = rep(0, 100)`, and (2) change the `newdata` parameter from the `predict` function: `predict = predict(estimation_lm, as.data.frame(cbind(x,y)), interval="confidence", level=0.95)` – Baumann Feb 19 '14 at 12:39

How to use the predict function for creating the confidence bands in R?

1 Answers1