How to select rows by group with the minimum value and containing NAs in R

Question

Here is an example:

set.seed(123)    
data<-data.frame(X=rep(letters[1:3], each=4),Y=sample(1:12,12),Z=sample(1:100, 12))
data[data==3]<-NA

What I am to realize is to select the unique row of X with minimum Y by ignoring NAs:

a 4 68
b 1 4
c 2 64

What's the best way to do that?

Justin · Accepted Answer · 2014-01-10T14:52:35.663

7

Using the data.table package, this is trivial:

library(data.table)

d <- data.table(data)
d[, min(Y, na.rm=TRUE), by=X]

You can also use plyr and its ddply function:

library(plyr)

ddply(data, .(X), summarise, min(Y, na.rm=TRUE))

Or using base R:

aggregate(X ~ ., data=data, FUN=min)

Based on the edits, I would use data.table for sure:

d[, .SD[which.min(Y)], by=X]

However, there are solutions using base R or other packages.

edited Jan 10 '14 at 14:52

answered Jan 10 '14 at 14:14

Justin

42,475
9
93
111

It works for this case, but actually my data set has 20s columns. I would select the rows with all columns rather than a summary of two. Some suggestions? Thanks – David Z Jan 10 '14 at 14:31
I'm afraid I don't understand. Please edit your question to reflect your actual... question... – Justin Jan 10 '14 at 14:35

Mark Heckmann · Answer 2 · 2014-01-10T17:52:13.947

1

This does not select the rows using an index but returns the values you want...

ddply(data, .(X), summarise, min=min(Y, na.rm=T))

  X min
1 a   5
2 b   1
3 c   4

EDIT AFTER COMMENT: To select the whole rows you may:

ddply(data, .(X), function(x) arrange(x, Y)[1, ])

  X Y  Z
1 a 4 68
2 b 1  4
3 c 2 64

Or

data$index <- 1L:nrow(data)
i <- by(data, data$X, function(x) x$index[which.min(x$Y)] )
data[i, ]

   X Y  Z index
1  a 4 68     1
6  b 1  4     6
10 c 2 64    10

edited Jan 10 '14 at 17:52

answered Jan 10 '14 at 14:15

Mark Heckmann

10,943
4
56
88

It does work for this case, but I updated my example to be more specific for my goal. Any suggestions? – David Z Jan 10 '14 at 14:38

score 0 · Answer 3 · answered Jan 10 '14 at 14:19

0

Using subset to for each letter may be this can help

data<-data.frame(X=rep(letters[1:3], each=4),Y=sample(1:12,12))
dataA <- subset(data, data$X=="a")
min(dataA$Y, na.rm=TRUE)

answered Jan 10 '14 at 14:19

Keniajin

1,649
2
20
43

score 0 · Answer 4 · answered Dec 21 '20 at 20:52

0

There is a data.table way

library(data.table)
set.seed(123)    
data<-data.frame(X=rep(letters[1:3], each=4),Y=sample(1:12,12),Z=sample(1:100, 12))
data[data==3]<-NA
data <- data.table(data)
data[data[,.I[which.min(Y)], by = "X"][,V1]]

answered Dec 21 '20 at 20:52

Shubham Gupta

650
6
18

How to select rows by group with the minimum value and containing NAs in R

4 Answers4

Linked