2

Let's say I have

> df
   ID  tti
1 118 11.2
2 118  1.4
3 118  9.2
4   9  2.7
5  12  1.2
6  12 82.8

There are unique and duplicated values in df$ID. If a value in df$ID is duplicated, I want to keep the row with the lowest df$tti value.

Expected output

> df
   ID  tti
1 118  1.4
2   9  2.7
3  12  1.2

I am looking for a solution in dplyr

Data

df <- structure(list(ID = c(118L, 118L, 118L, 9L, 12L, 12L), tti = c(11.2, 
1.4, 9.2, 2.7, 1.2, 82.8)), class = "data.frame", row.names = c(NA, 
-6L)) 
cmirian
  • 2,572
  • 3
  • 19
  • 59

2 Answers2

2

We can use slice_min after grouping by 'ID'

library(dplyr)
df %>%
    group_by(ID) %>%
    slice_min(tti) %>%
    ungroup

-output

# A tibble: 3 x 2
#     ID   tti
#  <int> <dbl>
#1     9   2.7
#2    12   1.2
#3   118   1.4

Or with collapse

library(collapse)
df %>%
    fgroup_by(ID) %>%
    fsummarise(tti = fmin(tti))

#   ID tti
#1   9 2.7
#2  12 1.2
#3 118 1.4

Or another option is roworder (which is faster than arrange from dplyr) with funique

roworder(df, ID, tti) %>%
     funique(cols = 1)
#    ID tti
#1   9 2.7
#2  12 1.2
#3 118 1.4
akrun
  • 874,273
  • 37
  • 540
  • 662
  • How would one do it if `tti` was a variable in `Date` format, in order to take the most recent row ? – Julien Jul 26 '22 at 14:42
  • 1
    @Julien If it is `Date` class, the code should work as `min/max` works for the integer storage mode for Date – akrun Jul 26 '22 at 15:15
1

By default distinct keeps the first row of duplicated ID. So if the data are sorted first by tti within ID then the first row will also be the smallest value of tti. By default arrange orders data in ascending order.

library(dplyr)
df %>% 
  dplyr::arrange(ID, tti) %>% 
  dplyr::distinct(ID, .keep_all = T)

Output

   ID tti
1   9 2.7
2  12 1.2
3 118 1.4
LMc
  • 12,577
  • 3
  • 31
  • 43