1

I would like to subset a data table in R within each subset, based on an aggregate function over the subset of rows. For example, for each key, return all values greater than the mean of a field calculated only for rows in subset. Example:

library(data.table)
t=data.table(Group=rep(c(1:5),each=5),Detail=c(1:25))
setkey(t,'Group')
library(foreach)
library(dplyr)

ret=foreach(grp=t[,unique(Group)],.combine=bind_rows,.multicombine=T) %do% 
  t[Group==grp&Detail>t[Group==grp,mean(Detail)],]
#        Group Detail
# 1:     1      4
# 2:     1      5
# 3:     2      9
# 4:     2     10
# 5:     3     14
# 6:     3     15
# 7:     4     19
# 8:     4     20
# 9:     5     24
#10:     5     25

The question is, is it possible to succinctly code the last two lines using data.table features? Sorry if this is a repeat, I am also struggling explaining the exact goal to have google/stackoverflow find it.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • 1
    `dt[, .SD[Detail > mean(Detail)], by = Group]` maybe? (I've renamed `t` to `dt` because `t` is a function in R). You could also do `indx <- dt[, .I[Detail > mean(Detail)], by = Group]$V1 ; dt[indx]` in order to gain some performance – David Arenburg Aug 24 '16 at 08:05
  • Thanks David, yeah, I didn't really run the code, just wrote it for an example (t as transpose). I checked your answers, look great. – Tim Grilley Aug 24 '16 at 08:21
  • Using `data.table` v >=1.9.7, you could also do a non-equi join such as `res <- dt[, mean(Detail), by = Group] ; dt[res, .(Group, x.Detail), on = .(Group, Detail > V1)]` – David Arenburg Aug 24 '16 at 08:23

1 Answers1

1

Using the .SD function works. Was not aware of it, thanks:

dt[, .SD[Detail > mean(Detail)], by = Group] 

Also works, with some performance gains:

indx <- dt[, .I[Detail > mean(Detail)], by = Group]$V1 ; dt[indx]