bootstrap a linear regression

Question

I am trying to run a bootstrap from a linear regression in R. The code I have so far is

hprice<-lm(dat[,1]~dat[,3]+dat[,4]+dat[,5]+dat[,6])
print (hprice)
pricefunc<-function(data,ind) lm(data[ind,1]~data[ind,3]+data[ind,4]+data[ind,5]+data[ind,6])
hpboot<-boot(dat,pricefunc, 1000)

this doesn't seem to work.

I don't really understand the statistic argument and I would say this is where I am going wrong.

thanks

What exactly does "doesn't work" mean? You did not supply enough code for anyone else to run it (ie no sample data). See [this question](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to make a reproducible example. — MrFlick, May 09 '14 at 13:01
you are right, I didn't supply enough code. My biggest problem was that I couldnt work out what 'statistic' argument meant in boot(). I got an answer that made it work, now I just need to work out how. Thanks — Jamzy, May 09 '14 at 13:38

Davide Passaretti · Accepted Answer · 2014-05-09T12:17:52.990

3

If you need the coefficients estimates you have to add $coef to the lm function

pricefunc<-function(data,ind) lm(data[ind,1]~data[ind,3]+data[ind,4]+data[ind,5]+data[ind,6])$coef

Then you can run:

boot(dat,pricefunc, 1000)

edited May 09 '14 at 12:17

answered May 09 '14 at 12:11

Davide Passaretti

2,741
1
21
32

ok. I tried that and I think it worked. Thank you. What does the $coef do? – Jamzy May 09 '14 at 13:36
the command `lm` gives you a list of info about the linear model, but the `boot` function only wants to make inference on the coefficients of the model, therefore you have to use `$coef` to extract just the info on the coefficients estimates. – Davide Passaretti May 09 '14 at 13:42

score 2 · Answer 2 · answered Jan 24 '19 at 20:02

This is a code that I always use for bootstrap regressions and change where necessary For the bootstrap to work, it is important that the observations are independently, identically distributed, and that the distribution of your estimates converges to the corresponding population distribution. In the example below I estimate a regression model with 20 observations. In this example every observation is entered twice. In that case, I would need to bootstrap over the original observations, to get appropriate standard errors.

set.seed(45)
x <- 2*rnorm(20)
epsilon <- rnorm(20)
y <- 1 - 0.5*x + epsilon # y variable is the regression
data1 <- data.frame(y=y,x=x,obs.id=1:20)
summary(lm(y~x,data=data1))

# now the dataset is entered twice but we know the id's of the original observations
data2 <- rbind(data1,data1)
summary(lm(y~x,data=data2))

# the coefficients are exactly the same, but the estimated standard errors are wrong
# due to the duplication of the dataset. The data are depenndent, the independent units of
# observation are the id's
B <- 10000
boot.b <- matrix(NA,nrow=B,ncol=2)
all.ids <- cbind(1:20,line1=1:20,line2=21:40)
for (b in 1:B){
ids.b <- sample(all.ids[,1],20,replace=TRUE)
lines.b <- c(all.ids[ids.b,2],all.ids[ids.b,3])
data.b <- data2[lines.b,]
boot.b[b,] <- coef(lm(y~x,data=data.b))
}
colMeans(boot.b)

coef(lm(y~x,data=data1))

var(boot.b)

vcov(lm(y~x,data=data2))

score 2 · Answer 3 · answered Mar 09 '21 at 13:53

There is also model_parameters function from parameters to get bootstrapped confidence intervals and p-values:

library(parameters)

mod <- lm(formula = wt ~ mpg, data = mtcars)

model_parameters(mod)
#> Parameter   | Coefficient |   SE |         95% CI | t(30) |      p
#> ------------------------------------------------------------------
#> (Intercept) |        6.05 | 0.31 | [ 5.42,  6.68] | 19.59 | < .001
#> mpg         |       -0.14 | 0.01 | [-0.17, -0.11] | -9.56 | < .001

model_parameters(mod, bootstrap = TRUE, iterations = 100)
#> Parameter   | Coefficient |         95% CI |     p
#> --------------------------------------------------
#> (Intercept) |        5.99 | [ 5.36,  6.68] | 0.010
#> mpg         |       -0.14 | [-0.17, -0.11] | 0.010

^{Created on 2021-03-09 by the reprex package (v1.0.0)}

bootstrap a linear regression

3 Answers3

Linked