0

How do I obtain the max value of a grouped dataframe?

I have this dataframe

library(dplyr)

df_example <- data.frame(gender = c(rep("female", 12),rep("male", 12) ),
                     state = rep(c(rep("widow", 8),rep("orphan", 4)),2),
                     age_group = rep(c("(20, 30]","(20, 30]","(30, 40]","(30, 40]", "(40, 50]","(40, 50]","(50, 60]","(50, 60]","(0, 10]","(0, 10]","(10, 20]","(10, 20]"),2),
                     relatioship_status = c("single","married"),
                     amount = c(10,15,9, 8,17,92,12,41,23,75,46,64,12,9,7,22,3, 14,33,14,1,87,54,21))

which generates

>  df_example
   gender  state relatioship_status age_group amount
1  female  widow             single  (20, 30]     10
2  female  widow            married  (20, 30]     15
3  female  widow             single  (30, 40]      9  
4  female  widow            married  (30, 40]      8
5  female  widow             single  (40, 50]     17
6  female  widow            married  (40, 50]     92
7  female  widow             single  (50, 60]     12
8  female  widow            married  (50, 60]     41
9  female orphan             single   (0, 10]     23
10 female orphan            married   (0, 10]     75
11 female orphan             single  (10, 20]     46
12 female orphan            married  (10, 20]     64
13   male  widow             single  (20, 30]     12
14   male  widow            married  (20, 30]      9
15   male  widow             single  (30, 40]      7
16   male  widow            married  (30, 40]     22
17   male  widow             single  (40, 50]      3
18   male  widow            married  (40, 50]     14
19   male  widow             single  (50, 60]     33
20   male  widow            married  (50, 60]     14
21   male orphan             single   (0, 10]      1
22   male orphan            married   (0, 10]     87
23   male orphan             single  (10, 20]     54
24   male orphan            married  (10, 20]     21

I want to obtain the max number of "amount" grouped by gender, state,relationship_status, and age group. The reason for this being that I want to assign the relationship status corresponding to the maximum amount obtained to some NA values in another dataframe that also share the same variables of gender, state and age group.

However when I run this code

df_example %>% 
  group_by(gender, state, age_group,relatioship_status) %>% 
  summarise(max(amount))

I get the same dataframe of df_example, which is reasonable given that I have grouped every variable possible.

So, how do I obtain the relationship status corresponding to the maximum amount grouped by gender, state, and age_group to be later assign to NA values in another data frame?

PS: Doing this

df_example %>% 
  group_by(gender, state, age_group) %>% 
  summarise(max(amount))

and later doing a left_join() doesn't help since the previous code doesn't show the relationship status.

Thanks :)

Dennis Aguilar
  • 113
  • 1
  • 7
  • 1
    Try `df_example %>% group_by(gender, state, age_group) %>% slice(which.max(amount))` – akrun Jun 15 '18 at 15:09
  • Thanks, It worked! .Why is the variable "relationship_status" not included in your code? – Dennis Aguilar Jun 15 '18 at 16:01
  • Here we are getting the row index with `which.max` and `slice` that row, so you will get all the variables of the dataset that is filtered by row – akrun Jun 15 '18 at 16:02

0 Answers0