3

I have an use-case that I have not came across before. I have the following data frame and would like to select values of "y" where "x" achieves its minimum and maximum respectively for each level of the condition "i".

> library(dplyr) 
> df <- data.frame(i=c(1,1,2,2),x=c(1.0,2.0,3.0,4.0),y=c('a','b','c','d'))
> ddply(df, .(i), summarise, Min=min(x), Max=max(x))
  i Min Max
  1   1   2
  2   3   4

which is correct but I'd like to instead have the y whose x is Min or Max.

  i Min Max
  1   a   b
  2   c   d

How can I do that?

AdamO
  • 4,283
  • 1
  • 27
  • 39
SkyWalker
  • 13,729
  • 18
  • 91
  • 187

4 Answers4

4

We can use slice

library(dplyr)
df %>% 
   group_by(i) %>% 
   slice(which.min(x)) %>%
   #or
   #slice(which.max(x)) %>%
   select(-x)
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Another option if you are willing to go outside of the tidyverse is data.table:

setDT(df)[, list(min = y[which.min(x)],
                 max = y[which.max(x)]), by = i]

#   i min max
#1: 1   a   b
#2: 2   c   d
Mike H.
  • 13,960
  • 2
  • 29
  • 39
3
     library(plyr)
     df <- data.frame(i=c(1,1,2,2),x=c(1.0,2.0,3.0,4.0),y=c('a','b','c','d'))
     ddply(df, .(i), summarise, Min=y[which.min(x)], Max=y[which.min(x)])
jrlewi
  • 486
  • 3
  • 8
  • I liked this one because it is the easiest/closest to my OP use-case in terms of dependency and simplicity. – SkyWalker Dec 29 '17 at 18:31
1

A solution in base R:

output <- by(df, df[, "i"], with, {
  data.frame(i=i[1], min=y[which.min(x)], max=y[which.max(x)])
})

Gives

> output
df[, "i"]: 1
  i min max
1 1   a   b
------------------------------------------------------------ 
df[, "i"]: 2
  i min max
1 2   c   d

(the data.frame is necessary to preserve the factor structure of "y" I believe).

The output can be concatenated with do.call(rbind, output)

> do.call(rbind, output)
  i min max
1 1   a   b
2 2   c   d
AdamO
  • 4,283
  • 1
  • 27
  • 39