how to leave only one value (the most recent) from the array of duplicates in R

Question

here little example

mydat=structure(list(a = c(8, 83, 8.5, 8.5, 7.5, 7.8, 7.5, 8, 7.5, 
8, 8), b = c(69.5, 70, 69.5, 68.5, 70, 69.5, 69.5, 70, 69.5, 
68.5, 70), PROB_POSTR_KM = c(378884L, 378884L, 378884L, 378884L, 
378884L, 378884L, 404136L, 404136L, 404136L, 404136L, 404136L
)), class = "data.frame", row.names = c(NA, -11L))

Here variable PROB_POSTR_KM. It has value 378884 and count of this value=6 Value 404136 (count 5)

how to remove duplicate values if they exist and leave only the most recent one. In this case, the desired result looks like

    a    b PROB_POSTR_KM
1 7.8 69.5        378884
2 8.0 70.0        404136

score 2 · Accepted Answer · answered Jan 26 '21 at 16:37

2

library(data.table)
setDT(mydat)

mydat[, tail(.SD, 1), PROB_POSTR_KM]
#    PROB_POSTR_KM   a    b
# 1:        378884 7.8 69.5
# 2:        404136 8.0 70.0

answered Jan 26 '21 at 16:37

IceCreamToucan

28,083
2
22
38

Isn't this also `mydat[!duplicated(PROB_POSTR_KM, fromLast = TRUE)]` – akrun Jan 26 '21 at 18:38
Yes. I was surprised to see none of the answers in the linked dupe use `duplicated` – IceCreamToucan Jan 26 '21 at 18:41
1

I like `duplicated` as it is fast. Haven't tested on big datasets with a group by vs duplicated – akrun Jan 26 '21 at 18:43

score 1 · Answer 2 · answered Jan 26 '21 at 16:37

1

Here is a dplyr solution:

library(dplyr)

mydat %>% 
  group_by(PROB_POSTR_KM) %>% 
  slice(which.max(1:n()))

Gives us:

# A tibble: 2 x 3
# Groups:   PROB_POSTR_KM [2]
      a     b PROB_POSTR_KM
  <dbl> <dbl>         <int>
1   7.8  69.5        378884
2   8    70          404136

answered Jan 26 '21 at 16:37

Matt

7,255
2
12
34

1

There is a `slice_tail(n = 1)` – akrun Jan 26 '21 at 18:39

how to leave only one value (the most recent) from the array of duplicates in R

2 Answers2