2

I'm trying to flag duplicate IDs in another column. I don't necessarily want to remove them yet, just create an indicator (0/1) of whether the IDs are unique or duplicates. In sql, it would be like this:

SELECT ID, count(ID) count from TABLE group by ID) a On TABLE.ID = a.ID set ID Duplicate Flag Column 1 = 1 where count > 1;

Is there a way to do this simply in r? Any help would be greatly appreciated.

Ian
  • 31
  • 1
  • 3

1 Answers1

1

As an example of duplicated let's start with some values (numbers here, but strings would do the same thing)

x <- c(9, 1:5, 3:7, 0:8)
x
# 9 1 2 3 4 5 3 4 5 6 7 0 1 2 3 4 5 6 7 8 

If you want to flag the second and later copies

as.numeric(duplicated(x))
# 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 0

If you want to flag all values that occur two or more times

as.numeric(x %in% x[duplicated(x)])
# 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0
Henry
  • 6,704
  • 2
  • 23
  • 39
  • Ok, not really sure how to write that, or if that actually gives me what I'm looking for? I have nearly 1 million records and I just want to flag whether the 'unique' id is repeated (i.e., one column of data, not the entire row). I've tried the following, which I think gets me pretty close, but I get an error saying "replacement has xxx rows, data has xxx rows") – Ian Jan 22 '19 at 18:40
  • Data$DupeFlagColumn[!duplicated(Data$Column)] <- 0 – Ian Jan 22 '19 at 18:41