0

I would like to select the youngest person in each group and categorize it by gender

so this is my initial data

 data1
       ID Age Gender Group 
    1 A01  25   m     a
    2 A02  35   f     b
    3 B03  45   m     b
    4 C99  50   m     b
    5 F05  60   f     a          
    6 X05  65   f     a 

I would like to have this

Gender Group Age  ID
m      a     25   A01 
f      a     60   F05 
m      b     45   B03
f      b     35   A02

So I tried with aggraeate function but I don't know how to attach the ID to it

aggregate(Age~Gender+Group,data1,min)

Gender Group Age  
m      a     25    
f      a     60    
m      b     45  
f      b     35  
Neophyte
  • 17
  • 4

1 Answers1

0

We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(data1)). If it is to get the row corresponding to the min of 'Age', we use which.min to get the row index of the min 'Age' grouped by 'Gender', 'Group' and then use that to subset the rows (.SD[which.min(Age)]).

setDT(data1)[, .SD[which.min(Age)], by = .(Gender, Group)]

Or another option would be to order by 'Gender', 'Group', 'Age', and then get the first row using unique.

unique(setDT(data1)[order(Gender,Group,Age)], 
                         by = c('Gender', 'Group'))

Or using the same methodology with dplyr, we use slice with which.min to get the corresponding 'Age' grouped by 'Gender', 'Group'.

library(dplyr)
data1 %>%
    group_by(Gender, Group) %>%
    slice(which.min(Age))

Or we can arrange by 'Gender', 'Group', 'Age' and then get the first row

data1 %>%
     arrange(Gender,Group, Age) %>%
     group_by(Gender,Group) %>%
     slice(1L)
akrun
  • 874,273
  • 37
  • 540
  • 662