How do I mark duplicates in a new column

Question

I would like to mark my duplicated values with respect to one column

Example i have a df

X    Y    Z 
1    4    5
2    5    7
1    3    6
7    2    7

then I want a new data frame df2 creating a new column dup which indicates whether X is duplicated or not as

X    Y    Z   dup
1    4    5   TRUE
2    5    7   FALSE
1    3    6   TRUE
7    2    7   FALSE

Any who could tell me how to do it?

Thanks HubertL for cleaning up the mess – Jørgen K. Kanters Jul 11 '16 at 20:32 — Jørgen K. Kanters, Jul 11 '16 at 20:32

score 3 · Answer 1 · edited Jul 11 '16 at 20:58

3

Using duplicated from base R:

df2 <- df
df2$dup <- c(duplicated(df2$X, fromLast = TRUE)  | duplicated(df2$X))

edited Jul 11 '16 at 20:58

Jaap

81,064
34
182
193

answered Jul 11 '16 at 20:37

Sumedh

4,835
2
17
32

score 1 · Answer 2 · edited Jul 11 '16 at 20:48

1

You can do that with data.table, grouping by your common field and checking you have more than one row for each group:

library(data.table)
dt <- fread("X    Y    Z 
1    4    5
2    5    7
1    3    6
7    2    7")

dt[, dup := .N > 1, by = X]

   X Y Z   dup
1: 1 4 5  TRUE
2: 2 5 7 FALSE
3: 1 3 6  TRUE
4: 7 2 7 FALSE

edited Jul 11 '16 at 20:48

Jaap

81,064
34
182
193

answered Jul 11 '16 at 20:33

HubertL

19,246
3
32
51

score 1 · Answer 3 · edited Jul 11 '16 at 20:38

1

Here's a method using ave():

df$dup <- ave(df$X, df$X, FUN = length) > 1L;
df;
##   X Y Z   dup
## 1 1 4 5  TRUE
## 2 2 5 7 FALSE
## 3 1 3 6  TRUE
## 4 7 2 7 FALSE

edited Jul 11 '16 at 20:38

Jaap

81,064
34
182
193

answered Jul 11 '16 at 20:35

bgoldst

34,190
6
38
64

@ProcrastinatusMaximus pro-whitespace? – bgoldst Jul 11 '16 at 20:41
it improves readability imo ;-) (which is important for answers) – Jaap Jul 11 '16 at 20:42
1

Thanks that is simple and understable and it works – Jørgen K. Kanters Jul 11 '16 at 20:53

How do I mark duplicates in a new column

3 Answers3