Flag duplicates in R

Question

I have the following dataset:

dataset <- data.frame(id = c("A","A","A","A","B","B","B,"B"),
                      value = c(1,1,2,3,5,6,6,7))

For every id that is duplicated, I want to flag the row where it happens, and this flag should be the same length of the dataframe source. This is the expected result:

id    value    flag
A     1        1
A     1        1
A     2        0
A     3        0
B     5        0
B     6        1
B     6        1
B     7        0

Is there a way where I don't have to use a for loop? Any help will be greatly appreciated.

score 7 · Accepted Answer · answered Sep 14 '20 at 02:04

We can use duplicated with and without fromLast = TRUE to mark all the values that are repeated as 1.

dataset$flag <- as.integer(duplicated(dataset$value) | 
                           duplicated(dataset$value, fromLast = TRUE))
dataset

#  id value flag
#1  A     1    1
#2  A     1    1
#3  A     2    0
#4  A     3    0
#5  B     5    0
#6  B     6    1
#7  B     6    1
#8  B     7    0

score 2 · Answer 2 · answered Sep 14 '20 at 05:31

use tidyverse

library(tidyverse)
df %>% 
  group_by(value) %>% 
  mutate(flag = +(n() > 1)) %>% 
  ungroup()

# A tibble: 8 x 3
  id    value  flag
  <chr> <dbl> <int>
1 A         1     1
2 A         1     1
3 A         2     0
4 A         3     0
5 B         5     0
6 B         6     1
7 B         6     1
8 B         7     0

Flag duplicates in R

2 Answers2