0

In the context of a study on statistical methods and also in the drafting of a report, I find myself confronted with a problem. I want to remove duplicates from a table by keeping only the last one.

For example, if I have :

Var1 Var2
1 F
2 H
2 F
3 H

I want :

Var1 Var2
1 F
2 F
3 H
user438383
  • 5,716
  • 8
  • 28
  • 43
Zaggamim
  • 47
  • 4

3 Answers3

3

In base R you can use duplicated with fromLast = TRUE

df[!duplicated(df$Var1, fromLast = TRUE), ]
#  Var1 Var2
#1    1    F
#3    2    F
#4    3    H
Maurits Evers
  • 49,617
  • 4
  • 47
  • 68
1

A possible solution:

library(tidyverse)

df %>% 
  group_by(Var1) %>% 
  summarise(Var2 = last(Var2))

#> # A tibble: 3 × 2
#>    Var1 Var2 
#>   <int> <chr>
#> 1     1 F    
#> 2     2 F    
#> 3     3 H
PaulS
  • 21,159
  • 2
  • 9
  • 26
1

Another base R option using aggregate with tail:

df <- read.table(text="Var1 Var2
1   F
2   H
2   F
3   H", header = TRUE)

aggregate(. ~ Var1, data = df, tail, 1)
#>   Var1 Var2
#> 1    1    F
#> 2    2    F
#> 3    3    H

Created on 2022-07-19 by the reprex package (v2.0.1)

Quinten
  • 35,235
  • 5
  • 20
  • 53