0

I have a dataframe that I need to make it unique based on two columns interchangeably meaning:

dataframe:

df <- data.frame(col1=c("a",1,"bar","foo"),col2=c(1,"a","foo","bar"))

enter image description here

my goal is to keep only one instance of the two rows that contain the same data so, for example, keep foo-bar or bar-foo would suffice my need

an output can be:

enter image description here

Ibo
  • 4,081
  • 6
  • 45
  • 65

3 Answers3

4

Here is a base R way.

inx <- !duplicated(t(apply(df, 1, sort)))
df[inx, ]

One-liner:

df[!duplicated(t(apply(df, 1, sort))), ]
#  col1 col2
#1    a    1
#3  bar  foo
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1

Based on Sorting each row of a data frame you can do

unique(t(apply(df, 1, sort)))

Simon Woodward
  • 1,946
  • 1
  • 16
  • 24
0

Under the assumption your data is paired all the way through, a for loop can solve this for you.

## Establish the data.frame to write into
df2 <- df[1,]
## Loop through the remaining information
for( i in 2:nrow(df) ){
    df2 <- rbind(df2, 
                 if(df[i,"col2"] %in% df2[,"col1"] ){ next }else{ df[i,] } )
}
Badger
  • 1,043
  • 10
  • 25
  • 2
    this will take so much time if the df is large, I have already found some solutions using `dplyr` and `data.table`, I was trying to find the easiest way – Ibo Feb 24 '20 at 20:22