-1

How can I get a random samples based on conditional values. For example I have the following dataframe:

GROUP CLASS  AGE
A     1      10
A     2      15
B     1      10
B     2      17
C     1      12
C     2      14

I need to get a sample of 30 records for each of the GROUPS, but only from CLASS = 1 compiled all in a sample dateframe.

I Know how to get a sample of 30 records, but I don't know how to create a condition that loops throught the different GROUPS and filters the CLASS

ran.sample = sample(nrow(df_all), 30)
df = df_all[ran.sample, ]

Any ideas?

Thanks

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317
Selrac
  • 2,203
  • 9
  • 41
  • 84
  • Do you need 1 sample row from each group where `Class = 1` ? – Ronak Shah Nov 26 '16 at 14:02
  • 1
    If you use `data.table`, then there probably are two duplicates : [samples by class](http://stackoverflow.com/questions/16289182/how-do-you-sample-random-rows-within-each-group-in-a-data-table) and [sample without class](http://stackoverflow.com/questions/24685421/how-do-you-extract-a-few-random-rows-from-a-data-table-on-the-fly) – etienne Nov 26 '16 at 14:07
  • 30 records for GROUP=A & CLASS =1 + 30 records for GROUP=B & CLASS =1 – Selrac Nov 26 '16 at 14:09
  • I can for example filter class first like df <- df[+which(df$CLASS==1),], but how can I then loop through the GROUPs to get 30 samples for each? – Selrac Nov 26 '16 at 14:38
  • Try this...`df[sample(which(df$CLASS==1),30),]`. – Chirayu Chamoli Nov 26 '16 at 14:52
  • @Selrac read the first link in my previous comment, you could adapt it to `setDT(df)[CLASS == 1, .SD[sample(.N, 30, replace = TRUE)], by = GROUP]` – etienne Nov 26 '16 at 14:55
  • I like your solution etienne. Can you have more that one items in the group? If I had to run for class 2 also, could the by=group & by=class? – Selrac Nov 26 '16 at 15:02
  • 1
    @Selrac sure you could. Using `.(GROUP, CLASS)` instead of `GROUP` should do it, but as your data is too small I can't test it. Also, be sure to use `@` when mentionning someone, otherwise I don't get the notification. – etienne Nov 26 '16 at 15:04

1 Answers1

1

Try this:

newdf <- df[df$CLASS==1,]
do.call(rbind, lapply(split(newdf, newdf$GROUP), function(x) x[sample(nrow(x), 30),]))
989
  • 12,579
  • 5
  • 31
  • 53