Delete duplicates but keep only the last one

Question

In the context of a study on statistical methods and also in the drafting of a report, I find myself confronted with a problem. I want to remove duplicates from a table by keeping only the last one.

For example, if I have :

Var1	Var2
1	F
2	H
2	F
3	H

I want :

Var1	Var2
1	F
2	F
3	H

score 3 · Accepted Answer · answered Jul 19 '22 at 09:04

3

In base R you can use duplicated with fromLast = TRUE

df[!duplicated(df$Var1, fromLast = TRUE), ]
#  Var1 Var2
#1    1    F
#3    2    F
#4    3    H

answered Jul 19 '22 at 09:04

Maurits Evers

49,617
4
47
68

score 1 · Answer 2 · answered Jul 19 '22 at 08:57

1

A possible solution:

library(tidyverse)

df %>% 
  group_by(Var1) %>% 
  summarise(Var2 = last(Var2))

#> # A tibble: 3 × 2
#>    Var1 Var2 
#>   <int> <chr>
#> 1     1 F    
#> 2     2 F    
#> 3     3 H

answered Jul 19 '22 at 08:57

PaulS

21,159
2
9
26

score 1 · Answer 3 · answered Jul 19 '22 at 09:10

Another base R option using aggregate with tail:

df <- read.table(text="Var1 Var2
1   F
2   H
2   F
3   H", header = TRUE)

aggregate(. ~ Var1, data = df, tail, 1)
#>   Var1 Var2
#> 1    1    F
#> 2    2    F
#> 3    3    H

^{Created on 2022-07-19 by the reprex package (v2.0.1)}

Delete duplicates but keep only the last one

3 Answers3