0

This is what I have done so far. I am not sure how to create the 95% confidence band:

x=rnorm(100,0,1)
e=rnorm(100,0,4)
for (i in 1:100){y[i]=2+3*x[i]+e[i]}
plot(x,y,lty=3)
estimation_lm=lm(y~x)
(summary(estimation_lm))
(cc=coef(estimation_lm))
abline(estimation_lm)
abline(a=2, b=3,col="red")

enter image description here

I know I have to use this code but I am not exactly sure what I should use in the new data or for interval(I guess I should use prediction) for this question:

predict(object, newdata, interval = "none"/"confidence"/"prediction",level = 0.95)

A more zoomed-in version of the part I am stuck in: enter image description here

Mona Jalal
  • 34,860
  • 64
  • 239
  • 408
  • 1
    Some time ago I wrote a little something on how to plot intervals in R. http://rpubs.com/RomanL/7024 – Roman Luštrik Feb 16 '14 at 08:12
  • @RomanLuštrik That's such an informative blog. My problem is that from the question I don't get what the new data is to insert in the predict formula. Thanks! – Mona Jalal Feb 16 '14 at 08:20
  • 1
    You can't insert formulas in predict, but you can calculate things by hand. You can get `\hat{y}` from `predict`, `x` are your raw values and you can get `t` with functions `qt`. See `?TDist`. – Roman Luštrik Feb 16 '14 at 08:24
  • Can you narrow down your question and be more specific where you get stuck? It would help (yourself and us!) if you could provide a small, reproducible example (tips on how to do that here http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Roman Luštrik Feb 16 '14 at 21:58

1 Answers1

0

The s_yhat formula indicates that the interval requested is for the mean, not for individual data. In this case, the correct parameter to be used in predict function is interval="confidence". See below:

library(gplots) # plotCI

data = data.frame(matrix(0, nrow=100, ncol=2))
colnames(data) = c("x", "y")
data$x = rnorm(100,0,1)
e = rnorm(100,0,4)
for (i in 1:100) {
  data$y[i] = 2 + 3*data$x[i] + e[i]
}
plot(data$x, data$y, xlab="x", ylab="y", pch=20)
estimation_lm = lm(y~x, data)
(summary(estimation_lm))
(coef(estimation_lm))
abline(estimation_lm)
abline(a=2, b=3, col="red", lty="dotted")
predict = predict(estimation_lm, data, interval="confidence", level=0.95)
plotCI(data$x, predict[,1], li=predict[,2], ui=predict[,3], add=T, col="blue", gap=0, pch=NA_integer_)
legend("bottomright", legend=c("estimated regression", "true line", "confidence interval 95%"), lty=c("solid", "dashed", "solid"), col=c("black", "red", "blue"))

enter image description here

Baumann
  • 1,119
  • 11
  • 20
  • Sounds like you are not using the student distribution for the new data?! – Mona Jalal Feb 17 '14 at 21:45
  • `predict` function does use the student distribution to calculate the confidence interval. Take a look at lines 146 and 147 of the `predict.lm` source code. – Baumann Feb 18 '14 at 12:48
  • why do you have this line of code? `data = data.frame(matrix(0, nrow=100, ncol=2))` I mean which part of the question depicts that you should have that line of the code? – Mona Jalal Feb 19 '14 at 02:32
  • I've created a `data.frame` to store the `x` and `y`. The reason is that the `newdata` parameter from the `predict` function expects a `data.frame` object. If you still want to work with `x` and `y` as separated vectors, you need to do 2 things: (1) initialize `y` before the loop: `y = rep(0, 100)`, and (2) change the `newdata` parameter from the `predict` function: `predict = predict(estimation_lm, as.data.frame(cbind(x,y)), interval="confidence", level=0.95)` – Baumann Feb 19 '14 at 12:39