second (or third) maximum value of a dataframe column using 'summarise'

Question

Say I have a data frame like this:

group1 <- c('a','a','a','a','a','a','b','b','b','b','b','b','b','b')
group2 <- c('x','y','x','y','x','y','x','y','x','y','x','y','x','y')
value <- round(runif(14, min=0, max=1), digits = 2)

df1 <- as.data.frame(cbind(group1,group2,value))
df1$value <- as.numeric(df1$value)

It is easy to get a new data frame with only the maximum values of each group, by using the dplyr package and summarise function:

df2 <- summarise(group_by(df1,group1),max_v = max(value))

But what I want is a new data frame with the 3 maximum values of each group, doing something like that:

df2 <- summarise(group_by(df1,group1),max_v = max(value),max2_v = secondmax(value),max3_v = thirdmax(value))

Is there a way to do that without using the sort function ?

You can use `arrange` function to avoid using `sort` i.e. ` df1 %>% group_by(group1) %>% arrange(desc(value)) %>% slice(seq_len(3)) %>% mutate(max = row_number()) %>% select(-group2) %>% spread(max, value)` — akrun, Jul 26 '17 at 19:15
`dplyr` pairs nicely with the pipe operator `%>%`. It'll make your code easier to read. — Andrew Brēza, Jul 26 '17 at 19:21
In principle `df1 %>% group_by(group1) %>% slice(nth(row_number(), 1:3, order_by = -value))` should get the top three, I guess. Unfortunately, the package authors decided that `nth()` should only return one number at a time... Anyway, @akrun's answer works (requiring tidyr). Looks worth posting. — Frank, Jul 26 '17 at 19:37
@Frank Thanks, but I think the OP's intention is not clear as it specified about not using `sort` and sort family functions — akrun, Jul 26 '17 at 19:39
@akrun OP is probably wrong. Your solution is perfectly sound. — Roman Luštrik, Jul 26 '17 at 21:01
Yes sorry I was referring to the solution of this question : https://stackoverflow.com/questions/2453326/fastest-way-to-find-second-third-highest-lowest-value-in-vector-or-column which did not suit me... But thank you @akrun ! your solution helped me a lot (I need to get familiar with the arguments %...% now) — Y.Coch, Jul 27 '17 at 14:57

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

We can use arrange/slice/spread way to get this

library(dplyr)
library(tidyr)
df1 %>%
  group_by(group1) %>%
  arrange(desc(value)) %>% 
  slice(seq_len(3)) %>%
  mutate(Max = paste0("max_", row_number())) %>%
  select(-group2) %>% 
  spread(Max, value)
# A tibble: 2 x 4
# Groups:   group1 [2]
#   group1 max_1 max_2 max_3
#* <fctr> <dbl> <dbl> <dbl>
#1      a  0.84  0.69  0.41
#2      b  0.89  0.72  0.54

data

df1 <- data.frame(group1,group2,value)

second (or third) maximum value of a dataframe column using 'summarise'

1 Answers1

data