ORIGINAL ANSWER: Bootstrapping a single column
The code below includes a simple bootstrapping function plus some additional code to return an informative data frame:
my_boot = function(x, times=1000) {
# Get column name from input object
var = deparse(substitute(x))
var = gsub("^\\.\\$","", var)
# Bootstrap 95% CI
cis = quantile(replicate(times, mean(sample(x, replace=TRUE))), probs=c(0.025,0.975))
# Return data frame of results
data.frame(var, n=length(x), mean=mean(x), lower.ci=cis[1], upper.ci=cis[2])
}
mtcars %>%
group_by(vs) %>%
do(my_boot(.$mpg))
vs var n mean lower.ci upper.ci
<dbl> <fctr> <int> <dbl> <dbl> <dbl>
1 0 mpg 18 16.61667 15.14972 18.06139
2 1 mpg 14 24.55714 22.36357 26.80750
UPDATE: Bootstrapping any selection of columns
Based on your comments, here is an updated method to get bootsrapped confidence intervals for any selection of columns:
library(reshape2)
library(tidyr)
my_boot = function(x, times=1000) {
# Bootstrap 95% CI
cis = quantile(replicate(times, mean(sample(x, replace=TRUE))), probs=c(0.025,0.975))
# Return results as a data frame
data.frame(mean=mean(x), lower.ci=cis[1], upper.ci=cis[2])
}
mtcars %>%
group_by(vs) %>%
do(as.data.frame(apply(., 2, my_boot))) %>%
melt(id.var="vs") %>%
separate(variable, sep="\\.", extra="merge", into=c("col","stat")) %>%
dcast(vs + col ~ stat, value.var="value")
vs col lower.ci mean upper.ci
1 0 am 0.1111111 0.3333333 0.5555556
2 0 carb 3.0000000 3.6111111 4.2777778
3 0 cyl 6.8888889 7.4444444 7.8888889
4 0 disp 262.3205556 307.1500000 352.4481944
5 0 drat 3.1877639 3.3922222 3.6011528
6 0 gear 3.2222222 3.5555556 3.9444444
7 0 hp 164.0500000 189.7222222 218.5625000
8 0 mpg 14.9552778 16.6166667 18.3225000
9 0 qsec 16.1888750 16.6938889 17.1744583
10 0 vs 0.0000000 0.0000000 0.0000000
11 0 wt 3.2929569 3.6885556 4.0880069
12 1 am 0.2142857 0.5000000 0.7857143
13 1 carb 1.2857143 1.7857143 2.3571429
14 1 cyl 4.1428571 4.5714286 5.0000000
15 1 disp 105.5703571 132.4571429 161.4657143
16 1 drat 3.5992143 3.8592857 4.1100000
17 1 gear 3.5714286 3.8571429 4.1428571
18 1 hp 79.7125000 91.3571429 103.2142857
19 1 mpg 21.8498214 24.5571429 27.3289286
20 1 qsec 18.7263036 19.3335714 20.0665893
21 1 vs 1.0000000 1.0000000 1.0000000
22 1 wt 2.2367000 2.6112857 2.9745571
Other updates to answer questions in the comments
UPDATE: To answer your comment to me in @BenBolker's answer: If you want the results returned by sample
, you can do this:
boot.dat = replicate(1000, sample(mtcars$mpg[mtcars$vs==1], replace=TRUE))
This will return a matrix with 1000 columns, each of which will be a separate bootstrap sample of mtcars$mpg
for vs==1
. You could also do:
boot.by.vs = sapply(split(mtcars, mtcars$vs), function(df) {
replicate(1000, sample(df$mpg, replace=TRUE))
}, simplify=FALSE)
This will return a list where the first list element is the matrix of bootstrap samples for vs==0
and the second is for vs==1
.
UPDATE 2: To answer your second comment, here's how to bootstrap the whole data frame (and assuming you want to save all the copies, rather than summarise them. The code below returns a list of 1000 bootstrapped versions of mtcars1
. This list will be huge if you have a lot of data, so you'll probably just want to keep summary results, like column means, for each bootstrap sample.
boot.df = lapply(1:1000, function(i) mtcars[sample(1:nrow(mtcars), replace=TRUE), ])