I am trying to sample my data within each group as in How do you sample random rows within each group in a data.table? Data:
set.seed(245)
DT = data.table( d = sample(1:2000), m = sample(1:700, 2000, replace = T))
DT[,length(unique(m))]
[1] 669
DT[,length(unique(d))]
[1] 2000
1) Firstly, approach DT[, .SD[sample(.N, 1)], by = m]
is not fast enough and I am quite certain that it could be done faster and better, but the faster approach, which were mentioned in previously linked post
DTs <- DT[DT[, sample(.I, 1), by=m][[2]],]
DTs[, .N]
[1] 659
DTs[, length(unique(d))]
[1] 633
does not work correctly, and I do not understand why (every element in DTs[, d]
should be unique).
2) Secondly, when I tried a different approach (to extract only d values):
DT[, sample(d, 1L), by = m][[2]]
I noticed that each time I obtain different length unique values and also their length is not as expected:
length(unique(DT[, sample(d, 1L), by = m][[2]]))
[1] 632
length(unique(DT[, sample(d, 1L), by = m][[2]]))
[1] 638
Could someone explain why this is happening? Or what I am doing wrong? And how to do this in fastest way possible?