0

here little example

mydat=structure(list(a = c(8, 83, 8.5, 8.5, 7.5, 7.8, 7.5, 8, 7.5, 
8, 8), b = c(69.5, 70, 69.5, 68.5, 70, 69.5, 69.5, 70, 69.5, 
68.5, 70), PROB_POSTR_KM = c(378884L, 378884L, 378884L, 378884L, 
378884L, 378884L, 404136L, 404136L, 404136L, 404136L, 404136L
)), class = "data.frame", row.names = c(NA, -11L))

Here variable PROB_POSTR_KM. It has value 378884 and count of this value=6 Value 404136 (count 5)

how to remove duplicate values if they exist and leave only the most recent one. In this case, the desired result looks like

    a    b PROB_POSTR_KM
1 7.8 69.5        378884
2 8.0 70.0        404136
psysky
  • 3,037
  • 5
  • 28
  • 64

2 Answers2

2
library(data.table)
setDT(mydat)

mydat[, tail(.SD, 1), PROB_POSTR_KM]
#    PROB_POSTR_KM   a    b
# 1:        378884 7.8 69.5
# 2:        404136 8.0 70.0
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
1

Here is a dplyr solution:

library(dplyr)

mydat %>% 
  group_by(PROB_POSTR_KM) %>% 
  slice(which.max(1:n()))

Gives us:

# A tibble: 2 x 3
# Groups:   PROB_POSTR_KM [2]
      a     b PROB_POSTR_KM
  <dbl> <dbl>         <int>
1   7.8  69.5        378884
2   8    70          404136
Matt
  • 7,255
  • 2
  • 12
  • 34