When repeated values in a dataframe, only keep the first value per id

Question

In a data frame I have a column which sometimes have repeated values for the same id, column A. When there are similar values for the same id in column A, I just want to keep the first. Imagine a big data set. How do I accomplish this? Thanks!

A <- c(18,6,39,39,3,56)
set.seed(1)
B <- sample(100,6)
set.seed(2)
C <- sample(100,6)


df <- data.frame(id = rep(1:3, each=2),A,B,C)
df
 id  A  B  C
1  1 18 68 85
2  1  6 39 79
3  2 39  1 70
4  2 39 34  6
5  3  3 87 32
6  3 56 43  8

id <- unique(df$id)

if (i in 1:length(id)){
  df[df$id==i,]
  if(length(df[df$A])>1){
    keep the first
  }
     else{
       return(df)
     }
}

Expected output:
 id  A  B  C
1  1 18 68 85
2  1  6 39 79
3  2 39  1 70
5  3  3 87 32
6  3 56 43  8

AnilGoyal · Answer 1 · 2021-02-19T05:19:07.360

3

Use this in dplyr

library(dplyr)
df %>% group_by(id, A) %>% slice_head() 

# A tibble: 5 x 4
# Groups:   id, A [5]
     id     A     B     C
  <int> <int> <int> <int>
1     1     6    39    79
2     1    18    68    85
3     2    39     1    70
4     3     3    87    32
5     3    56    43     8

edited Feb 19 '21 at 05:19

answered Feb 19 '21 at 05:09

AnilGoyal

25,297
4
27
45

score 3 · Answer 2 · answered Feb 19 '21 at 05:12

3

You can use duplicated on id and A columns.

df[!duplicated(df[1:2]), ]

#  id  A  B  C
#1  1 18 68 85
#2  1  6 39 79
#3  2 39  1 70
#5  3  3 87 32
#6  3 56 43  8

data

df <- structure(list(id = c(1L, 1L, 2L, 2L, 3L, 3L), A = c(18L, 6L, 
39L, 39L, 3L, 56L), B = c(68L, 39L, 1L, 34L, 87L, 43L), C = c(85L, 
79L, 70L, 6L, 32L, 8L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

answered Feb 19 '21 at 05:12

Ronak Shah

377,200
20
156
213

Thank you! How could it be done in a for loop? – user11916948 Feb 19 '21 at 05:54
1

`for` loop seems to be too complicated for this task. – Ronak Shah Feb 19 '21 at 06:10

When repeated values in a dataframe, only keep the first value per id

2 Answers2