0

I am trying to create a bootstrap for the data frame 'ev_all_clean'

set.seed(1315)
boot_s <- as.data.frame(matrix(NA, ncol = 19, nrow = 1000))
for(k in 1:19){
  for(l in 1:1000){
  boot_s[l,k]<- sample(ev_all_clean[,k], size=1, replace=T)
  }
}

The above code returns

replacement element 1 has 768 rows to replace 1 rows

Help is much appreciated! Thank you in advance.

Vouz
  • 1
  • Can you share `ev_all_clean` df as well, so a proposed solution is tested? – deepseefan Oct 25 '19 at 09:59
  • 1
    Do you need to bootstrap all columns independently, or just bootstrap the dataframe by rows? –  Oct 25 '19 at 10:06
  • @saudic I need to bootstrap all columns independently – Vouz Oct 26 '19 at 11:21
  • @deepseefan I could not share the data because it's proprietary but it has 768 entries, 20 total columns – Vouz Oct 26 '19 at 11:22
  • Here's the list of the variables: $gender "numeric" $area "factor" $age "numeric" $wdays "numeric" $order "numeric" $whours "numeric" $ni "factor" $aware "factor" $max_v "numeric" $accel "factor" $max_r "numeric" $m_charge "factor" $t_charge "numeric" $p_charge "numeric" $f_mt "numeric" $p_mt "numeric" $p_buy "numeric" $interest "factor" $purchase "factor" – Vouz Oct 26 '19 at 11:30

2 Answers2

0

I just created a dummy ev_all_clean based on the spec you gave and see if this can get you started.

boot_s <- as.data.frame(matrix(NA, ncol = 19, nrow = 1000))
for(k in 1:19){
  for(l in 1:1000){
    boot_s[[l,k]]<- sample(ev_all_clean[,k], size=1, replace=TRUE)
  }
}

Just a bit of explanation about what the script does:

ev_all_clean[,k] - loops through individual column and sample 1 element with replacement and that single element will be used replace the value at boot_s[[l,k]]. You can read about the difference between [ and [[ here.

Since you're sampling a single element, you probably want to replace a single element and that is what the error message is trying to tell you -I think.

deepseefan
  • 3,701
  • 3
  • 18
  • 31
  • Thank you very much! The code ran but it ended up creating vector as a datapoint on the booted dataframe (So on the new dataset every cell becomes a vector based on the column with 768 datapoints inside each). I just want a number on every single cell of the booted dataframe. – Vouz Oct 27 '19 at 05:57
0

Once you have you original dataframe boot_s, to have your first bootstrap you could do

# first prepare a dataframe of the same shape as your original
samplemat=boot_s 
# then perform bootstrap for each column separately
for(k in 1:19){
  samplemat[,k] = boot_s[sample(1:nrow(boot_s),nrow(boot_s),T),k]
}

sample prepare a vector of the indices of the rows to keep in your bootstraped sample. You will have to do that for each bootstrap operation you want (you will want many).

  • Thank you! The code worked but I want to expand the number of datapoints (from the original of 768 to 3000 for example). – Vouz Oct 27 '19 at 05:56
  • The number of rows is not hard-coded in my answer, you can put a dataframe with any number of rows. The number of colums is 19, but you can also treat that as a variable using `ncol(boot_s)` instead of 19. –  Oct 27 '19 at 09:03