1

I have data.table

X = data.table(x = c(1,1,1,1,1,2,2,2,2,2), y = c(3,2,1,-1,5,7,4,-2,3,5))

I want to subset only rows which are above negative values in one group:

res = data.table(x = c(1,1,1,2,2), y = c(3,2,1,7,4)

From five values in first group, I want to get only first three, because fourth is negative, and the same with second group.

Frank
  • 66,179
  • 8
  • 96
  • 180

2 Answers2

6

Here are two options:

X[, .SD[seq_len(which.max(y<0)-1L)], by = x]

Or (perhaps more efficient because it avoids .SD):

X[ X[, .I[seq_len(which.max(y<0)-1L)], by = x]$V1 ]
talat
  • 68,970
  • 21
  • 126
  • 157
  • 1
    +1. Really have to get to issue [#613](https://github.com/Rdatatable/data.table/issues/613): Optimize `.SD[i]` query to keep the elegance but make it faster unchanged – Matt Dowle Feb 23 '16 at 18:57
1

We may also do

X[, .SD[cummin(sign(y))>0], x]
#   x y
#1: 1 3
#2: 1 2
#3: 1 1
#4: 2 7
#5: 2 4
akrun
  • 874,273
  • 37
  • 540
  • 662