6

I am trying to filter all rows within a group in a data.table if a max value within that group is > some value. Below is how I would do it in DPLY and how I got it working in two steps in data.table.

#DPLYR 
df<-data.table(
  x =1:12
  ,y = 1:3
)

df %>% group_by(y) %>% 
  filter(max(x) < 11)

##data.table
df[,max_value :=max(x),by=y][max_value<11]

The output should be

    x y
1:  1 1 
2:  4 1 
3:  7 1 
4: 10 1

Is there a way to do this in one step without creating the column in my dataset? All that I have been able to find are subsetting a group to get one specific value within a group, not return all row of the group that meet the condition.

Andrew Troiano
  • 187
  • 1
  • 8
  • in situations like this it's best to provide some picture of the desired output. Is it just `df`? – jvalenti Aug 28 '19 at 18:23
  • Just updated it, but yeah, a subset of df with just the group values that meet that condition. – Andrew Troiano Aug 28 '19 at 18:31
  • you could add ``[,-3]`` even if it doesn't really answer your question. ``df[,max_value :=max(x),by=y][max_value<11][,-3]`` – Gainz Aug 28 '19 at 18:39
  • Yeah, I mean, Ideally, I wouldn't have to create a 2nd column only to drop it but I can just do ```[,-length(colnames(df)-1]``` to generalize it. – Andrew Troiano Aug 28 '19 at 18:41
  • Yeah sorry I don't know of a way in data.table that wouldn't take as much code. – Gainz Aug 28 '19 at 18:59

1 Answers1

8

We can use .I to get the row index, extract the index column and subset

df[df[, .I[max(x) < 11], y]$V1]
#    x y
#1:  1 1
#2:  4 1
#3:  7 1
#4: 10 1

Or another option is .SD

df[, .SD[max(x) < 11], y]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    I was going to say this question needs akrun haha. I wasn't aware we could do this with ``.I`` I guess I'll have to take a look at those cheat sheets again. I'm not OP but I'm curious, do you know of a way to do this with base R? Thank you. – Gainz Aug 28 '19 at 19:10
  • 1
    @Gainz In `base R`, i think `df[with(df, ave(x, y, FUN = max) < 11),]` – akrun Aug 28 '19 at 19:12
  • Thank you Akrun I got it to work with ``df[with(df, ave(x, y, FUN = function(x) max(x)) < 11),]`` – Gainz Aug 28 '19 at 19:17
  • 2
    @Gainz Either way you can do. it. I made it compact by avodiing the anonymous call – akrun Aug 28 '19 at 19:17
  • 1
    Yes my bad, when I tried your code I had an error which was because of something related to the ``Global Environment``, your answer works perfectly. – Gainz Aug 28 '19 at 19:19