This is a combination of two questions ( Repeat the re-sampling function for 1000 times ? Using lapply? and How do you sample groups in a data.table with a caveat).
The goal is to sample groups in a data.table, but repeat this process "n" times and pull the average for each row-value. For example:
#generate the data
DT = data.table(a=c(1,1,1,1:15,1,1), b=sample(1:1000,20))
#sample the data as done in the second linked question
DT[,.SD[sample(.N,min(.N,3))],by = a]
a b
1: 1 288
2: 1 881
3: 1 409
4: 2 937
5: 3 46
6: 4 525
7: 5 887
8: 6 548
9: 7 453
10: 8 948
11: 9 449
12: 10 670
13: 11 566
14: 12 102
15: 13 993
16: 14 243
17: 15 42
Now here is my attempt using the answer given in the first-linked question:
x <- replicate(100,{DT[,.SD[sample(.N,min(.N,3))],by = a]})
This returns a list "x" with each repetition. The only way I can think of accessing the repetitions is by this:
# repetition 1 col-a values
x[[1]]
# repetition 1 col-b values
x[[2]]
# repetition 2 col-a values
x[[3]]
# repetition 2 col-b values
x[[4]]
So in order to achieve the average for each row, I would have to find the mean of x[[j]]
where j
goes from seq(2,200,2)
where 200
is the number of replications*2.
Is there an easier way of doing this? I have tried using this solution (https://stats.stackexchange.com/questions/8225/how-to-summarize-data-by-group-in-r) in this fashion:
y <- DT[,.SD[sample(.N,min(.N,3))],by = a]
y[,list(mean=mean(b)),by=a]
a mean
1: 1 550
2: 2 849
3: 3 603
4: 4 77
5: 5 973
6: 6 746
7: 7 919
8: 8 655
9: 9 883
10: 10 823
11: 11 533
12: 12 483
13: 13 53
14: 14 827
15: 15 413
But I have yet to be able to do this with the replication process. Any help would be great!