2

I want to select a subset of rows based upon a condition after grouping by a specific column in R data table . Take Mtcars data for an example .

dt_mtcars <- as.data.table(mtcars)


dt_mtcars[,.N,by=.(hp)]

    hp  N
 1: 110 3
 2:  93 1
 3: 175 3
 4: 105 1
 5: 245 2
 6:  62 1
 7:  95 1
 8: 123 2
 9: 180 3
10: 205 1
11: 215 1
12: 230 1
13:  66 2
14:  52 1
15:  65 1
16:  97 1
17: 150 2
18:  91 1
19: 113 1
20: 264 1
21: 335 1
22: 109 1
     hp N

Now what I am trying to achieve is like selecting a sub set of data based on the count for e.g in here I want to have all the data in dt_mtcars where N >1 if I try something like this (which is conceptually what I am trying to achieve) it shows all the rows rather than getting the rows where N is greater than 1 .

dt_mtcars[.N>1,.SD,by=.(hp)]

Please assist .

Bg1850
  • 3,032
  • 2
  • 16
  • 30

2 Answers2

3

We can do this using if

dt_mtcars[, if(.N>1) .SD ,hp]
akrun
  • 874,273
  • 37
  • 540
  • 662
2

You can do it in two steps:

dt_mtcars[,Count:=.N, by=.(hp)][Count>1, .SD, .SDcols = rev(rev(1:length(names(dt_mtcars)))[-1])]

EDIT: realized mtcars does not have just those two rows, updated SDcols

EDIT: Fixed to remove count

Shape
  • 2,892
  • 19
  • 31