0

I have data as below:

df <- data.frame(x=c("A","B","C","D"), y=c("B","A","D","C"), z=c(2,2,0.4,0.4), stringsAsFactors = F)

x    y   z
A    B   2
B    A   2
C    D   0.4
D    C   0.4

I would like the data as below:

A    B   2
C    D   0.4

How can I do this?

Jaap
  • 81,064
  • 34
  • 182
  • 193
Mousumi
  • 11
  • 1
  • 1
    Are you removing duplicates in the numeric column (assuming it is a column), or duplicates in the text columns? Please clarify and consider a reproducible example that can be read into R directly. – Remko Duursma Jan 13 '17 at 10:45
  • `df[,1:2] <- t(apply(df[,1:2], 1, sort)); df[!duplicated(df),]` – Jaap Jan 13 '17 at 10:57

4 Answers4

1

Using:

df[,1:2] <- t(apply(df[,1:2], 1, sort))
df[!duplicated(df),]

will give:

  x y   z
1 A B 2.0
3 C D 0.4
Jaap
  • 81,064
  • 34
  • 182
  • 193
0

You can use the code below.

dat1 <- data.frame(X=c("A","B","C","D"),Y=c("B","A","D","C"),Z=c(2,2,0.4,0.4),stringsAsFactors = F)
dat1
  X Y   Z
1 A B 2.0
2 B A 2.0
3 C D 0.4
4 D C 0.4

Lets define a function which we can use to sort records by row and collapse it into a vector.

sort_paste <- function(x){ paste(sort(x),collapse=";") }

check_dups <- apply(dat1,1,sort_paste)
check_dups
[1] "2.0;A;B" "2.0;A;B" "0.4;C;D" "0.4;C;D"
dat1[ which(! duplicated(check_dups)), ]
  X Y   Z
1 A B 2.0
3 C D 0.4
ab90hi
  • 435
  • 1
  • 4
  • 11
0

Presuming you're just trying to removing duplicates in z column:

 subset(df, !duplicated(z))
user1165199
  • 6,351
  • 13
  • 44
  • 60
0

We can use pmin/pmax

library(data.table)
setDT(df)[!duplicated(pmin(x,y), pmax(x,y))]
#   x y   z
#1: A B 2.0
#2: C D 0.4
akrun
  • 874,273
  • 37
  • 540
  • 662