I have data sets that have 1 to 70 columns of data with 1 to 5 columns of ID variables. I need to group by the ID variables and then randomly sample the rows of data so that the re-sampled data set is the same length as the original data set. Below is and example DATA
set with the desired RESULT
table.
So I need to group_by SITE
and DATE
and then randomly sample a single row from STUFF:STUFF3
. Please note how the RESULT
table retains the order of data across the columns of STUFF:STUFF3
. For example the first two rows in the RESULT
table are both 2,4,8 which corresponds to row 2 in the DATA
table.
I have code that subsets
in a for
loop, but I would prefer to use dplyr
. I hope this is clear. Thanks.
DATA = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(1, 2, 30, 40, 100, 200, 300, 5000, 6000),
STUFF2 = c(2, 4, 60, 80, 200, 400, 600, 10000, 12000),
STUFF3 = c(4, 8, 120, 160, 400, 800, 1200, 20000, 24000))
RESULT = data.frame(SITE = c("A","A","A","A","B","B","B","C","C"),
DATE = c("1","1","2","2","3","3","3","4","4"),
STUFF = c(2, 2, 30, 30, 200, 300, 300, 6000, 5000),
STUFF2 = c(4, 4, 60, 60, 400, 600, 600, 12000, 10000),
STUFF3 = c(8, 8, 120, 120, 800, 1200, 1200, 24000, 20000))