How to select the row with the highest categorical variable level in R

Question

I have the following data frame

library(dplyr)
ReportNumber<-c("19062167","19062167","19062167","19062822","19062822")
UCR_casetype<-c("Homicide","Homicide","Assault","Rape","Rape")
(df<-data.frame(ReportNumber,UCR_casetype))

  ReportNumber UCR_casetype
1     19062167     Homicide
2     19062167     Homicide
3     19062167      Assault
4     19062822         Rape
5     19062822         Rape

The UCR_casetype is a way to classify crimes which has a hierarchy where Homicde>Rape>Assault. and I used the following to introduce levels into the UCR_casetype variable

df$UCR_casetype<-factor(df$UCR_casetype,
       levels = c("Assault","Rape","Homicide"),ordered=TRUE)

What I want is to obtain the row that has the highest level under the UCR_casetype variable grouped by ReportNumber so that the resulting data frame looks like the following

  ReportNumber UCR_casetype
1     19062167     Homicide
4     19062822         Rape

I tried this however, it does not work

df%>%group_by(ReportNumber)%>%
      filter(max(UCR_casetype))

score 1 · Accepted Answer · answered Apr 20 '20 at 22:01

We could do a group by slice on the index with which.max

library(dplyr)
df %>% 
    group_by(ReportNumber) %>%
    slice(which.max(UCR_casetype))
# A tibble: 2 x 2
# Groups:   ReportNumber [2]
#  ReportNumber UCR_casetype
#  <fct>        <ord>       
#1 19062167     Homicide    
#2 19062822     Rape

score 1 · Answer 2 · answered Apr 20 '20 at 22:10

You can do this using by in a data.table

library(data.table)
ReportNumber <- c("19062167","19062167","19062167","19062822","19062822")
UCR_casetype <- factor(c("Homicide","Homicide","Assault","Rape","Rape"), levels = c("Homicide", "Rape", "Assault"))
df <- data.table(ReportNumber, UCR_casetype)

# Solution 
df[, levels(UCR_casetype)[unique(min(as.numeric(UCR_casetype)))], by = ReportNumber]

How to select the row with the highest categorical variable level in R

2 Answers2