1

Consider the following dataframe:

df <- data.frame(V1 = c("A", "A", "B", "B", "C", "C"),
           V2 = c("B", "C", "A", "C", "A", "B"),
           n = c(1, 3, 1, 2, 3, 2))

How can I remove duplicate pair-wise columns so that the output looks like:

#   V1 V2 n
#1  A  B 1
#2  A  C 3
#3  B  C 2

I tried unique() and duplicated() to no avail.

  • This looks to be similar: [Select equivalent rows - a-b b-a](http://stackoverflow.com/questions/19647875/select-equivalent-rows-a-b-b-a/19648451#19648451) – thelatemail Feb 27 '15 at 03:36
  • @thelatemail It is quite similar, perhaps even a duplicate. That other question is more well-written - it is more clear what is desired. This question should be edited to show the true intent (or closed as a duplicate or unclear). – Matthew Lundberg Feb 27 '15 at 03:50

2 Answers2

5

Not sure if this is the simplest way of doing it (transposing can be computationally expensive) but this would work with your data frame:

df <- data.frame(V1 = c("A", "A", "B", "B", "C", "C"),
                 V2 = c("B", "C", "A", "C", "A", "B"),
                 n = c(1, 3, 1, 2, 3, 2))

First, sort the data frame row-wise, so your value-pairs become true duplicates.

df <- data.frame(t(apply(df, 1, sort)))

Then you can just apply the unique function.

df <- unique(df)

If your column names and order are important, you'll have to re-establish those.

names(df) <- c("n", "V1", "V2")
df <- df[, c("V1", "V2", "n")]
Roland Seubert
  • 315
  • 3
  • 9
  • Very elegant solution. Incredibly helpful. Been stuck on this exact same issue for hours until I ran across your post. Thank you for posting. – Purrsia Sep 18 '21 at 06:25
2

Another option would be to reshape (xtabs(n~..)) the dataset ('df') to wide format, set the lower triangular matrix to 0, and remove the rows with "Freq" equal to 0.

m1 <- xtabs(n~V1+V2, df)
m1[lower.tri(m1)] <- 0
subset(as.data.frame(m1), Freq!=0)
#  V1 V2 Freq
#4  A  B    1
#7  A  C    3
#8  B  C    2
akrun
  • 874,273
  • 37
  • 540
  • 662