Bootstrap a large data set

Question

I would like to bootstrap a large data set which contains multiple column and row variables. The following is a simplified re-creation of my data set:

charDataDiff <- data.frame(c('A','B','C'), matrix(1:72, nrow=9))
colnames(charDataDiff) <- c("patchId","s380","s390","s400","s410","s420","s430","s440","s450")

Separate the data using the patchId as the criteria. This creates three lists: one for each Variable

idColor <-  c("A", "B", "C")
(patchSpectrum <- lapply(idColor, function(idColor) charDataDiff[charDataDiff$patchId==idColor,]))

Created the function sampleBoot to sample the patchSpectrum

sampleBoot <-  function(nbootstrap=2, patch=3){
    return(lapply(1:nbootstrap, function(i)
             {patchSpectrum[[patch]][sample(1:nrow(patchSpectrum[[patch]]),replace=TRUE),]}))}

Example:

sampleBoot(5,3)

Here is where I am stuck:

I need to sample each patchId list along with each column variable (which the above "sampleBoot" easily accomplish),
Take the median of each patchId sampling list iteration, and
Create a new population of the medians to calculate parametric parameters. I can do it manually but that would be silly.

Your separation step can be written more simply as `patchSpectrum <- by(charDataDiff, charDataDiff$varNames, data.frame)`. — Ken Williams, Oct 28 '12 at 04:05

score 1 · Accepted Answer · answered Oct 28 '12 at 01:24

1

As much as I understand from your question, you may do as follows:

do.call(rbind, lapply(sampleBoot(5, 3), function(x) apply(x[-1], 2, median)))

It crates a table of the medians of 5 samplings of patch 3.

answered Oct 28 '12 at 01:24

Ali

9,440
12
62
92

Thank you i will re-post the issue with more clarity. your answer helped me re-define the issue. – Ragy Isaac Oct 28 '12 at 13:31
@RagyIsaac So the command above is not what you desired? It creates a table of population medians I think – Ali Oct 28 '12 at 15:51

Bootstrap a large data set

1 Answers1

Linked

Related