I have data with about 25 different groups. In an effort to see how the variance of each group would change if I had different sample sizes I am trying to do stratified bootstraping. For example at sample size 5, it should produce 1000 collections of 5 resampled points for each group. I like to collect smallest sample size as necessary in possible range of 5 to 30 per group.
The problem I am running into is that I have to subset each group and run the bootstrapping on individual groups then copy and past the R output into excel. (I am fairly green at R and how to code). It takes too long. I need to automate the bootstrapping to recognize groups and somehow save a statistic of the collection of 1000 groups, to a dataframe. Does this make sense?
Here is what code I have so far:....
#sample data
set.seed(1234)
df <- data.frame(g.name = as.factor(sample(c(LETTERS),100, replace = T)),
C.H = as.numeric(sample(c(1:9),100, replace=T)))
#subset data by group... here only a three examples
Agroup=subset(df,C.H=='A')
Bgroup=subset(df,C.H=='B')
Cgroup=subset(df,C.H=='C')
#Bootstrap selecting a sample size of "i", "B" number of times. i.e. I am
selecting sample sizes from 5 to 30, 1000 times each. I then apply var() to
the sample, and take the multiple variances(or the variance of the
variances). C.H is the measurement ranging from 1 to 9.
B=1000
cult.var=(NULL)
for (i in 5:30){
boot.samples=matrix(sample(Agroup$C.H,size=B*i,
replace=TRUE),B,i)
cult.var[i]=var(apply(boot.samples,1,var))
}
print(cult.var)
This works but it is a lot of copy and paste. I think I need to use either a for loop to do the bootstrapping by group or figure something else out. I did find a way to do a stratified sampling all by itself without bootstrapping. So maybe I could figure out how to repeat that 1000 times somehow...
The example here using the function boot()
does not fit my situation. I have fiddled with it a little to no avail. I am not sure how to write functions which may also be why I can not figure it out.