1

I am trying to run a bootstrap from a linear regression in R. The code I have so far is

hprice<-lm(dat[,1]~dat[,3]+dat[,4]+dat[,5]+dat[,6])
print (hprice)
pricefunc<-function(data,ind) lm(data[ind,1]~data[ind,3]+data[ind,4]+data[ind,5]+data[ind,6])
hpboot<-boot(dat,pricefunc, 1000)

this doesn't seem to work.

I don't really understand the statistic argument and I would say this is where I am going wrong.

thanks

Siguza
  • 21,155
  • 6
  • 52
  • 89
Jamzy
  • 159
  • 1
  • 7
  • What exactly does "doesn't work" mean? You did not supply enough code for anyone else to run it (ie no sample data). See [this question](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to make a reproducible example. – MrFlick May 09 '14 at 13:01
  • you are right, I didn't supply enough code. My biggest problem was that I couldnt work out what 'statistic' argument meant in boot(). I got an answer that made it work, now I just need to work out how. Thanks – Jamzy May 09 '14 at 13:38

3 Answers3

3

If you need the coefficients estimates you have to add $coef to the lm function

pricefunc<-function(data,ind) lm(data[ind,1]~data[ind,3]+data[ind,4]+data[ind,5]+data[ind,6])$coef

Then you can run:

boot(dat,pricefunc, 1000)
Davide Passaretti
  • 2,741
  • 1
  • 21
  • 32
  • ok. I tried that and I think it worked. Thank you. What does the $coef do? – Jamzy May 09 '14 at 13:36
  • the command `lm` gives you a list of info about the linear model, but the `boot` function only wants to make inference on the coefficients of the model, therefore you have to use `$coef` to extract just the info on the coefficients estimates. – Davide Passaretti May 09 '14 at 13:42
2

This is a code that I always use for bootstrap regressions and change where necessary For the bootstrap to work, it is important that the observations are independently, identically distributed, and that the distribution of your estimates converges to the corresponding population distribution. In the example below I estimate a regression model with 20 observations. In this example every observation is entered twice. In that case, I would need to bootstrap over the original observations, to get appropriate standard errors.

set.seed(45)
x <- 2*rnorm(20)
epsilon <- rnorm(20)
y <- 1 - 0.5*x + epsilon # y variable is the regression
data1 <- data.frame(y=y,x=x,obs.id=1:20)
summary(lm(y~x,data=data1))

# now the dataset is entered twice but we know the id's of the original observations
data2 <- rbind(data1,data1)
summary(lm(y~x,data=data2))

# the coefficients are exactly the same, but the estimated standard errors are wrong
# due to the duplication of the dataset. The data are depenndent, the independent units of
# observation are the id's
B <- 10000
boot.b <- matrix(NA,nrow=B,ncol=2)
all.ids <- cbind(1:20,line1=1:20,line2=21:40)
for (b in 1:B){
ids.b <- sample(all.ids[,1],20,replace=TRUE)
lines.b <- c(all.ids[ids.b,2],all.ids[ids.b,3])
data.b <- data2[lines.b,]
boot.b[b,] <- coef(lm(y~x,data=data.b))
}
colMeans(boot.b)

coef(lm(y~x,data=data1))

var(boot.b)

vcov(lm(y~x,data=data2))

2

There is also model_parameters function from parameters to get bootstrapped confidence intervals and p-values:

library(parameters)

mod <- lm(formula = wt ~ mpg, data = mtcars)

model_parameters(mod)
#> Parameter   | Coefficient |   SE |         95% CI | t(30) |      p
#> ------------------------------------------------------------------
#> (Intercept) |        6.05 | 0.31 | [ 5.42,  6.68] | 19.59 | < .001
#> mpg         |       -0.14 | 0.01 | [-0.17, -0.11] | -9.56 | < .001

model_parameters(mod, bootstrap = TRUE, iterations = 100)
#> Parameter   | Coefficient |         95% CI |     p
#> --------------------------------------------------
#> (Intercept) |        5.99 | [ 5.36,  6.68] | 0.010
#> mpg         |       -0.14 | [-0.17, -0.11] | 0.010

Created on 2021-03-09 by the reprex package (v1.0.0)

Indrajeet Patil
  • 4,673
  • 2
  • 20
  • 51