3

I used Excel to calculate the confidence interval on a predicted value, at 95% confidence interval, so to calculate t-value I used function TINV(5%,6) thats a 2.5% and 2.5% split on each side, where 6 is degree of freedom.

But in R, the predict function, when I give level= 0.95, I get a different interval range, however giving level=0.975 gives me the same answer as excel.

So, seems like in predict when you give level=0.975, then it takes a split of 2.5% and 2.5% on each side

But all these websites, For example, another example, say we are looking for 95% confidence interval giving the formula level=0.95 (this would mean a 5% split on each side), but thats 90%, the level should be 0.975 for a 95% interval.

Whats going on? I am probabily getting confused.


EDIT:

predict(model, data.frame(c= 12.75, p= 6, f=8), level = 0.975, interval = "confidence")

The model here is a multiple linear regression

Data:

y <- c(85.10,106.30,50.20,130.60,54.80,30.30,79.40,91.00,135.40,89.30) # Total Sales

c <- c(8.50,12.90,5.20,10.70,3.10,3.50,9.20,9.00,15.10,10.20) # production cost

p <- c(5.10,5.80,2.10,8.40,2.90,1.20,3.70,7.60,7.70,4.50) # Promotion cost

f <- c(4.70,8.80,15.10,12.20,10.60,3.50,9.70,5.90,20.80,7.90) #First year box office

model <- lm(y ~ c + p + f)

Excel:

I have marked in yellow the forcasting in Excel

The problem is, with Excel I get forecast of 106.72 with upper 119.35 and lower 93.36 with =tinv(5%,6)

With R I get forecast of 106.72 with upper at 117.7 and lower at 95.65, level=0.95

With level=0.975 I get exact values as Excel.


In Excel:

=tinv(5%,6) = 2.45``Variance = 5.46

106.72 +/- tvalue*variance : 119.35 93.36

In R:

se.ci <- predi$se.fit # Variance: 4.518

alpha <- qt((1-0.95)/2,6) # Value: -2.45

predi$fit[1] + c(alpha, -alpha) * se.ci # gives me 117.77165 95.65941

As you can see the tstat value is the same but prediction is different.

But when I do this:

alpha <- qt((1-0.975)/2,6) # Value: -2.968

I get 93.30182 120.12924 same as excel! (Using level=0.975 in predict gets me the answer, hence the confusion)

SamFlynn
  • 369
  • 7
  • 20
  • How did you calculate the prediction intervals in Excel ? See my answer below. The intervals calculated using R are (of course) correct. – Marco Sandri May 07 '17 at 10:07
  • In excel, I calculated the variance and tinv value http://www2.cedarcrest.edu/academic/bio/hale/biostat/session19links/tinv.html which I am sure is correct, and used formula 106.72 +/- tinv*variance. – SamFlynn May 07 '17 at 16:56
  • Are you using the `T.INV` Excel function or `TINV` ? The two functions works differently. – Marco Sandri May 08 '17 at 12:31

1 Answers1

1

In simple linear regression the prediction interval for y for a given x* is: enter image description here
where sy is given by: enter image description here

Let us consider the following example:

df <- faithful
n <- nrow(df)
names(df) <- c("y","x")
mx <- mean(df$x)
sx <- sd(df$x)

mod = lm(y ~ x, data=df)
yhat <- predict(mod)

xnew <- 80
newdata = data.frame(x=xnew)

alpha <- 0.05
(ypred <- predict(mod, newdata, interval="predict", level = 1-alpha))

#### 95% Prediction interval #####
      fit      lwr      upr
1 4.17622 3.196089 5.156351

We can calculate this interval "by hand" using the formula given above:

SE <- sqrt(sum((df$y-yhat)^2)/(n-2))*sqrt(1+1/n+(xnew-mx)^2/((n-1)*sx^2))   
tval <- qt(1-alpha/2,n-2)
c(ypred[1]-tval*SE, ypred[1]+tval*SE)

#### 95% Prediction interval #####
[1] 3.196089 5.156351
Marco Sandri
  • 23,289
  • 7
  • 54
  • 58