Following up this question, I wondered how I can effectively sample a stratified Person Period file.
I have a database who looks like this
id time var clust
1: 1 1 a clust1
2: 1 2 c clust1
3: 1 3 c clust1
4: 2 1 a clust1
5: 2 2 a clust1
...
With individuals id
grouped into clusters clust
. What I would like is to sample id
by clust
, keeping the person period format.
The solution I came up with is to sample id
and then to merge
back. However, is it not a very elegant solution.
library(data.table)
library(dplyr)
setDT(dt)
dt[,.SD[sample(.N,1)],by = clust] %>%
merge(., dt, by = 'id')
which gives
id clust.x time.x var.x time.y var.y clust.y
1: 2 clust1 1 a 1 a clust1
2: 2 clust1 1 a 2 a clust1
3: 2 clust1 1 a 3 c clust1
4: 3 clust2 3 c 1 a clust2
5: 3 clust2 3 c 2 b clust2
6: 3 clust2 3 c 3 c clust2
7: 5 clust3 1 a 1 a clust3
8: 5 clust3 1 a 2 a clust3
9: 5 clust3 1 a 3 c clust3
Is there a more straightforward solution ?
library(data.table)
dt = setDT(structure(list(id = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L, 4L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L), .Label = c("1", "2",
"3", "4", "5", "6"), class = "factor"), time = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L,
3L), .Label = c("1", "2", "3"), class = "factor"), var = structure(c(1L,
3L, 3L, 1L, 1L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 1L, 3L, 2L, 2L,
3L), .Label = c("a", "b", "c"), class = "factor"), clust = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L,
2L), .Label = c("clust1", "clust2", "clust3"), class = "factor")), .Names = c("id",
"time", "var", "clust"), row.names = c(NA, -18L), class = "data.frame"))