Seeking some expertise/guidance on creating a new column to indicate possible duplicates based on a few selected columns.
I have the following dataframe
ID | Animal | Age | Delivery | Cost | Country |
---|---|---|---|---|---|
1 | dog | 5 | Air | 120 | Nigeria |
2 | cat | 3 | Air | 110 | Kenya |
3 | fish | 1 | Air | 20 | Kenya |
4 | dog | 5 | Air | 150 | Nigeria |
5 | cat | 3 | Air | 100 | Kenya |
6 | dog | 6 | Air | 180 | Egypt |
7 | cat | 3 | Air | 135 | Kenya |
8 | turtle | 10 | Air | 90 | Nigeria |
df = structure(list(ID = 1:8, Animals = c("dog", "cat", "fish", "dog",
"cat", "dog", "cat", "turtle"), Age = c(5L, 3L, 1L, 5L, 3L, 6L,
3L, 10L), Delivery = c("Air", "Air", "Air", "Air", "Air", "Air",
"Air", "Air"), Cost = c(120L, 110L, 20L, 150L, 100L, 180L, 135L,
90L), Country = c("Nigeria", "Kenya", "Kenya", "Nigeria", "Kenya",
"Egypt", "Kenya", "Nigeria")), class = "data.frame", row.names = c(NA,
-8L))
I would like to create a new column that highlights if the 3 columns - Animal, Age & Country repeats more than once to tag as duplicates.
The output would be the following
ID | Animal | Age | Delivery | Cost | Country | New Column |
---|---|---|---|---|---|---|
1 | dog | 5 | Air | 120 | Nigeria | Y |
2 | cat | 3 | Air | 110 | Kenya | Y |
3 | fish | 1 | Air | 20 | Kenya | N |
4 | dog | 5 | Air | 150 | Nigeria | Y |
5 | cat | 3 | Air | 100 | Kenya | Y |
6 | dog | 6 | Air | 180 | Egypt | N |
7 | cat | 3 | Air | 135 | Kenya | Y |
8 | turtle | 10 | Air | 90 | Nigeria | N |
Thanks in advance!