1

New to R and to programming. This might be an easy question. I'm trying to find duplicate elements in certain pairs of columns, and replace both the original and the duplicate with N/A. So if I have the following dataset:

mydf <- structure(list(V1 = c(1, 2, 3, 1, 3, 2) V2 = c("zz", "aa", "bb", "zz", "yy", 
"ii"), V3 = c("aa", "ff", "aa", "hh", "cc", "jj"), V4 = c("ee", 
"xx", "ee", "hh", "dd", "kk"), V5 = c(213L, 254L, 235L, 356L, 
796L, 954L)), class = "data.frame", row.names = c(NA, -6L))

  V1 V2 V3 V4  V5
1  1 zz aa ee 213
2  2 aa ff xx 254
3  3 bb aa ee 235
4  1 zz hh hh 356
5  3 yy cc dd 796
6  2 ii jj kk 954

I'd like to find rows that are duplicate either in V1 and V2, or in V3 and V4. So the final result would look like this:

    V1   V2   V3   V4  V5
1   N/A  N/A  N/A  N/A 213
2    2   aa   ff   xx  254
3    3   bb   N/A  N/A 235
4   N/A  N/A  hh   hh  356
5    3   yy   cc   dd  796
6    2   ii   jj   kk  954

  • @RonakShah sure... – Sotos Jan 09 '20 at 10:10
  • Not sure what you need, are you comparing V1 with V2 or V1 with itself? I am not seeing how the first rows become `N/A` since V1 and V2 have no pairwise duplicates. – NelsonGon Jan 09 '20 at 10:57
  • 1
    @NelsonGon If I have understood OP correctly, they want to treat `V1` and `V2` as one pair of columns & `V3` and `V4` as other pair. `1 zz` in first row is duplicate of `1 zz` in 4th row hence they both are NA in first pair of columns. Similarly, `aa ee` in first row is duplicate to `aa ee` in 3rd row for second pair of columns. Hence, they both turn to `NA` too. – Ronak Shah Jan 09 '20 at 14:43

1 Answers1

0

You can check for duplicated rows in different columns and replace it with NA.

cols1 <- c('V1', 'V2')
cols2 <- c('V3', 'V4')

mydf[cols1][duplicated(mydf[cols1]) | duplicated(mydf[cols1], fromLast = TRUE),] <- NA
mydf[cols2][duplicated(mydf[cols2]) | duplicated(mydf[cols2], fromLast = TRUE),] <- NA

mydf
# V1   V2   V3   V4  V5
#1 NA <NA> <NA> <NA> 213
#2  2   aa   ff   xx 254
#3  3   bb <NA> <NA> 235
#4 NA <NA>   hh   hh 356
#5  3   yy   cc   dd 796
#6  2   ii   jj   kk 954
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213