0

In many fields of the natural sciences it is common practice to report the results of linear regression analysis as y = (a1 +- u(a1)) + (a2 +- u(a2)) * x, including R2 and p, but not the original data. u(a1) and u(a2) are the uncertainties (standard error) of a1 and a2. How could I calculate with this information the confidence and prediction intervals, or have a “reasonable” estimation?

Let me clarify with an example. This is a dummy dataset, with a line of slope 1 and gaussian noise 10:

set.seed(1961)
npoints <- 1e2
(x <- 1:npoints)
(y <-1:npoints + rnorm(npoints, 0, npoints/10))

Now I perform the linear regression:

par(mar = c(4, 4, 1, 1))
xy.model <- lm(y ~ x)
plot(x, y, pch = 16)
abline(xy.model, col = "orange", lwd = 2)
(xy.sum   <- summary(xy.model))
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept) -1.28106    1.94918  -0.657    0.513    
# x            1.00484    0.03351  29.987   <2e-16 ***
# Residual standard error: 9.673 on 98 degrees of freedom
# Multiple R-squared:  0.9017,  Adjusted R-squared:  0.9007 
# F-statistic: 899.2 on 1 and 98 DF,  p-value: < 2.2e-16

I calculate the confidence and prediction intervals:

x.new   <- data.frame(x = 1:npoints)
xy.conf <- predict.lm(xy.model, se.fit = TRUE, interval = "confidence", newdata = x.new)
xy.pred <- predict.lm(xy.model, se.fit = TRUE, interval = "prediction", newdata = x.new)

For example, the confidence and prediction intervals of the first point are:

xy.conf$fit[1, ] 
#        fit        lwr        upr 
# -0.2762127 -4.0867009  3.5342755
xy.pred$fit[1, ]
#       fit         lwr         upr 
# -0.2762127 -19.8462821  19.2938568

If the regression equation is reported as y = (-1.28106 +- 1.94918) + (1.00484 +- 0.03351) * x, R2 = 0.9017, p < 0.05, but the original data are not provided, how can I reproduce (at least approximately) the values, of the confidence and prediction intervals?

JASC
  • 176
  • 3
  • 14
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Or perhaps this would be a better question for [stats.se] where statistical questions are on topic. – MrFlick Jul 07 '21 at 03:32
  • Thanks, I have provided a reproducible example. – JASC Jul 07 '21 at 23:59

1 Answers1

1

Without the original data you need one more piece of information: the means of the two variables. The statistics you provide allow the construction of the linear regression line, but the confidence and prediction bands are narrowest at mean(x), mean(y) so without those you cannot compute them.

A simple example may make this clearer. Start with some data:

z <- structure(list(x = c(5, 5.1, 5.4, 5.8, 4.7, 5.7, 4.8, 5.1, 4.6, 
5.4, 5.2, 5, 5, 5.5, 5.2, 5.1, 4.7, 5.2, 4.8, 5.4, 4.8, 5.1, 
5, 4.6, 4.8), y = c(3.4, 3.7, 3.4, 4, 3.2, 3.8, 3, 3.5, 3.1, 
3.7, 4.1, 3.4, 3.6, 4.2, 3.5, 3.3, 3.2, 3.4, 3, 3.9, 3.1, 3.5, 
3.5, 3.4, 3.4)), row.names = c(NA, -25L), class = "data.frame")

Compute the regression line and plot it along with the data:

z.lm <- lm(y~x, z)
z.lm
# 
# Call:
# lm(formula = y ~ x, data = z)
# 
# Coefficients:
# (Intercept)            x  
#     -0.4510       0.7762  
# 
plot(y~x, z, xlim=c(0, 20), ylim=c(0, 20))
abline(z.lm)

Now create a new data set from the original data and compute the regression:

x2 <- z$x + 10
y2 <- z$y+(10 * coef(z.lm)[2])
z2 <- data.frame(x=x2, y=y2)
points(y~x, z2, col="red")
z2.lm <- lm(y~x, z2)
z2.lm
# 
# Call:
# lm(formula = y ~ x, data = z2)
# 
# Coefficients:
# (Intercept)            x  
#     -0.4510       0.7762  

Notice the regression coefficients are the same as the original data. In fact changing 10 to any other value will produce another set of data with the same regression results.

Regression line

dcarlson
  • 10,936
  • 2
  • 15
  • 18
  • I do not have the original data, I only have the regression parameters that I included in the question. – JASC Jul 07 '21 at 04:24
  • Thanks for your input, I take your point. However, I do not know how to use this information to solve the problem. I could estimate the means from the ranges (if they are provided, which is not so uncommon). – JASC Jul 08 '21 at 00:02