How to flag duplicate values in r - newbie

Question

I'm trying to flag duplicate IDs in another column. I don't necessarily want to remove them yet, just create an indicator (0/1) of whether the IDs are unique or duplicates. In sql, it would be like this:

SELECT ID, count(ID) count from TABLE group by ID) a On TABLE.ID = a.ID set ID Duplicate Flag Column 1 = 1 where count > 1;

Is there a way to do this simply in r? Any help would be greatly appreciated.

Does this answer your question? [Finding ALL duplicate rows, including "elements with smaller subscripts"](https://stackoverflow.com/questions/7854433/finding-all-duplicate-rows-including-elements-with-smaller-subscripts) — Boops Boops, Feb 04 '20 at 05:55

score 1 · Answer 1 · answered Jan 19 '19 at 01:15

1

As an example of duplicated let's start with some values (numbers here, but strings would do the same thing)

x <- c(9, 1:5, 3:7, 0:8)
x
# 9 1 2 3 4 5 3 4 5 6 7 0 1 2 3 4 5 6 7 8

If you want to flag the second and later copies

as.numeric(duplicated(x))
# 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 0

If you want to flag all values that occur two or more times

as.numeric(x %in% x[duplicated(x)])
# 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0

answered Jan 19 '19 at 01:15

Henry

6,704
2
23
39

Ok, not really sure how to write that, or if that actually gives me what I'm looking for? I have nearly 1 million records and I just want to flag whether the 'unique' id is repeated (i.e., one column of data, not the entire row). I've tried the following, which I think gets me pretty close, but I get an error saying "replacement has xxx rows, data has xxx rows") – Ian Jan 22 '19 at 18:40
Data$DupeFlagColumn[!duplicated(Data$Column)] <- 0 – Ian Jan 22 '19 at 18:41

How to flag duplicate values in r - newbie

1 Answers1