This is a code that I always use for bootstrap regressions and change where necessary
For the bootstrap to work, it is important that the observations are independently, identically distributed, and that the distribution of your estimates converges to the corresponding population distribution. In the example below I estimate a regression model with 20 observations. In this example every observation is entered twice. In that case, I would need to bootstrap over the original observations, to get appropriate standard errors.
set.seed(45)
x <- 2*rnorm(20)
epsilon <- rnorm(20)
y <- 1 - 0.5*x + epsilon # y variable is the regression
data1 <- data.frame(y=y,x=x,obs.id=1:20)
summary(lm(y~x,data=data1))
# now the dataset is entered twice but we know the id's of the original observations
data2 <- rbind(data1,data1)
summary(lm(y~x,data=data2))
# the coefficients are exactly the same, but the estimated standard errors are wrong
# due to the duplication of the dataset. The data are depenndent, the independent units of
# observation are the id's
B <- 10000
boot.b <- matrix(NA,nrow=B,ncol=2)
all.ids <- cbind(1:20,line1=1:20,line2=21:40)
for (b in 1:B){
ids.b <- sample(all.ids[,1],20,replace=TRUE)
lines.b <- c(all.ids[ids.b,2],all.ids[ids.b,3])
data.b <- data2[lines.b,]
boot.b[b,] <- coef(lm(y~x,data=data.b))
}
colMeans(boot.b)
coef(lm(y~x,data=data1))
var(boot.b)
vcov(lm(y~x,data=data2))