How to remove duplicates by unique (rowwise) values in two columns

Question

I have data as below:

df <- data.frame(x=c("A","B","C","D"), y=c("B","A","D","C"), z=c(2,2,0.4,0.4), stringsAsFactors = F)

x    y   z
A    B   2
B    A   2
C    D   0.4
D    C   0.4

I would like the data as below:

A    B   2
C    D   0.4

How can I do this?

Are you removing duplicates in the numeric column (assuming it is a column), or duplicates in the text columns? Please clarify and consider a reproducible example that can be read into R directly. — Remko Duursma, Jan 13 '17 at 10:45
`df[,1:2] <- t(apply(df[,1:2], 1, sort)); df[!duplicated(df),]` — Jaap, Jan 13 '17 at 10:57

score 1 · Answer 1 · answered Jan 13 '17 at 11:00

1

Using:

df[,1:2] <- t(apply(df[,1:2], 1, sort))
df[!duplicated(df),]

will give:

  x y   z
1 A B 2.0
3 C D 0.4

answered Jan 13 '17 at 11:00

Jaap

81,064
34
182
193

score 0 · Answer 2 · answered Jan 13 '17 at 11:02

You can use the code below.

dat1 <- data.frame(X=c("A","B","C","D"),Y=c("B","A","D","C"),Z=c(2,2,0.4,0.4),stringsAsFactors = F)
dat1
  X Y   Z
1 A B 2.0
2 B A 2.0
3 C D 0.4
4 D C 0.4

Lets define a function which we can use to sort records by row and collapse it into a vector.

sort_paste <- function(x){ paste(sort(x),collapse=";") }

check_dups <- apply(dat1,1,sort_paste)
check_dups
[1] "2.0;A;B" "2.0;A;B" "0.4;C;D" "0.4;C;D"
dat1[ which(! duplicated(check_dups)), ]
  X Y   Z
1 A B 2.0
3 C D 0.4

score 0 · Answer 3 · answered Jan 13 '17 at 11:06

0

Presuming you're just trying to removing duplicates in z column:

 subset(df, !duplicated(z))

answered Jan 13 '17 at 11:06

user1165199

6,351
13
44
60

score 0 · Answer 4 · answered Jan 13 '17 at 11:17

0

We can use pmin/pmax

library(data.table)
setDT(df)[!duplicated(pmin(x,y), pmax(x,y))]
#   x y   z
#1: A B 2.0
#2: C D 0.4

answered Jan 13 '17 at 11:17

akrun

874,273
37
540
662

How to remove duplicates by unique (rowwise) values in two columns

4 Answers4