5

Say I have a data frame like this:

group1 <- c('a','a','a','a','a','a','b','b','b','b','b','b','b','b')
group2 <- c('x','y','x','y','x','y','x','y','x','y','x','y','x','y')
value <- round(runif(14, min=0, max=1), digits = 2)

df1 <- as.data.frame(cbind(group1,group2,value))
df1$value <- as.numeric(df1$value)

It is easy to get a new data frame with only the maximum values of each group, by using the dplyr package and summarise function:

df2 <- summarise(group_by(df1,group1),max_v = max(value))

But what I want is a new data frame with the 3 maximum values of each group, doing something like that:

df2 <- summarise(group_by(df1,group1),max_v = max(value),max2_v = secondmax(value),max3_v = thirdmax(value))

Is there a way to do that without using the sort function ?

lmo
  • 37,904
  • 9
  • 56
  • 69
Y.Coch
  • 331
  • 4
  • 13
  • 3
    You can use `arrange` function to avoid using `sort` i.e. ` df1 %>% group_by(group1) %>% arrange(desc(value)) %>% slice(seq_len(3)) %>% mutate(max = row_number()) %>% select(-group2) %>% spread(max, value)` – akrun Jul 26 '17 at 19:15
  • `dplyr` pairs nicely with the pipe operator `%>%`. It'll make your code easier to read. – Andrew Brēza Jul 26 '17 at 19:21
  • 1
    In principle `df1 %>% group_by(group1) %>% slice(nth(row_number(), 1:3, order_by = -value))` should get the top three, I guess. Unfortunately, the package authors decided that `nth()` should only return one number at a time... Anyway, @akrun's answer works (requiring tidyr). Looks worth posting. – Frank Jul 26 '17 at 19:37
  • 1
    @Frank Thanks, but I think the OP's intention is not clear as it specified about not using `sort` and sort family functions – akrun Jul 26 '17 at 19:39
  • @akrun OP is probably wrong. Your solution is perfectly sound. – Roman Luštrik Jul 26 '17 at 21:01
  • @RomanLuštrik Okay, then I will post – akrun Jul 27 '17 at 02:12
  • Yes sorry I was referring to the solution of this question : https://stackoverflow.com/questions/2453326/fastest-way-to-find-second-third-highest-lowest-value-in-vector-or-column which did not suit me... But thank you @akrun ! your solution helped me a lot (I need to get familiar with the arguments %...% now) – Y.Coch Jul 27 '17 at 14:57

1 Answers1

3

We can use arrange/slice/spread way to get this

library(dplyr)
library(tidyr)
df1 %>%
  group_by(group1) %>%
  arrange(desc(value)) %>% 
  slice(seq_len(3)) %>%
  mutate(Max = paste0("max_", row_number())) %>%
  select(-group2) %>% 
  spread(Max, value)
# A tibble: 2 x 4
# Groups:   group1 [2]
#   group1 max_1 max_2 max_3
#* <fctr> <dbl> <dbl> <dbl>
#1      a  0.84  0.69  0.41
#2      b  0.89  0.72  0.54

data

df1 <- data.frame(group1,group2,value)
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662