At the data.table in column C3 I want to flag N randomly selected rows by each group (C1). There are several similar questions have already been asked on SO here, here and here. But based on the answers still cannot figure out how to find a solution for my task.
set.seed(1)
dt = data.table(C1 = c("A","A","A","B","C","C","C","D","D","D"),
C2 = c(2,1,3,1,2,3,4,5,4,5))
dt
C1 C2
1: A 2
2: A 1
3: A 3
4: B 1
5: C 2
6: C 3
7: C 4
8: D 5
9: D 4
10: D 5
Here are row indexes for two randomly selected rows by each group C1 (doesn't work well for group B):
dt[, sample(.I, min(.N, 2)), by = C1]$V1
[1] 1 3 3 7 5 10 9
NB: for B only one row should be selected because group B consists of one row only.
Here is a solution for one randomly selected row in each group, which often doesn't work for group B:
dt[, C3 := .I == sample(.I, 1), by = C1]
dt
C1 C2 C3
1: A 2 FALSE
2: A 1 TRUE
3: A 3 FALSE
4: B 1 FALSE
5: C 2 TRUE
6: C 3 FALSE
7: C 4 FALSE
8: D 5 TRUE
9: D 4 FALSE
10: D 5 FALSE
Actually I want to expand it on N rows. I've tried (for two rows):
dt[, C3 := .I==sample(.I, min(.N, 2)), by = C1]
which of course doesn't work.
Any help is much appreciated!