how to select the minimum pvalue and the next closer to the minimum in data.table

Question

Here is an example about what I want:

set.seed(123)    
data<-data.frame(X=rep(letters[1:3], each=4),Y=sample(1:12,12),Z=sample(1:100, 12))
setDT(data)

What I would like to do is to select the unique row of X with minimum Y and the next closer value to the minimum

Desired output

>data
a 4 68
a 5 11
b 1 4
b 10 89
c 2 64
c 3 82

data[, .SD[which.min(Y)], by=X]

But how to do it with the minimum and the next closer?

Assuming your data frame is a data table, what about `data[rank(Y) %in% 1:2 , ]` or, for a regular data frame `data[rank(data$Y) %in% 1:2, ]`? — eipi10, Jun 14 '16 at 16:39
thanks @eipi10, I have used `data[, .SD[rank(Y) %in% 1:2] , by=X]`, and it worked. If you answer my question I'll give you the credit :-) — user2380782, Jun 14 '16 at 16:47
Ah, sorry, I missed the fact that you were also grouping by `X`. Feel free to answer the question yourself. There's no problem with answering your own question. — eipi10, Jun 14 '16 at 16:53

score 3 · Accepted Answer · answered Jun 14 '16 at 17:12

3

For the ungrouped case, for a data.table you can do:

data[rank(Y) %in% 1:2, ]

For the grouped case, you can do:

data[ , .SD[rank(Y) %in% 1:2] , by=X]

answered Jun 14 '16 at 17:12

eipi10

1

Shameless plug for my eponym: data.table also has an `frank()` function. Here's the standard reference for the grouped case: http://stackoverflow.com/a/16574176/ – Frank Jun 14 '16 at 18:49
1

@Frank, I didn't realize you pronounced your name "eff-rank". – eipi10 Jun 14 '16 at 19:35

1 Answers1