0

I have a dataframe of 3 columns

A B 1
A B 1
A C 1
B A 1

I want to aggregate it such that it considers combinations A-B and B-A to be the same, resulting in

A B 3
A C 1

How do I go about this?

stochastiq
  • 269
  • 3
  • 17
  • 2
    Relevant - http://stackoverflow.com/questions/35834385/create-unique-identifier-from-the-interchangeable-combination-of-two-variables/35834584 and http://stackoverflow.com/questions/25297812/pair-wise-duplicate-removal-from-dataframe/25298863 and http://stackoverflow.com/questions/25145982/extract-unique-rows-from-a-data-table-with-each-row-unsorted/25151395 – thelatemail Jun 20 '16 at 03:12

1 Answers1

1

Use pmin and pmax on the first two columns and then do the group-by-count:

library(dplyr);
df %>% group_by(G1 = pmin(V1, V2), G2 = pmax(V1, V2)) %>% summarise(Count = sum(V3))
Source: local data frame [2 x 3]
Groups: G1 [?]

     G1    G2 Count
  (chr) (chr) (int)
1     A     B     3
2     A     C     1

Corresponding data.table solution would be:

library(data.table)
setDT(df)
df[, .(Count = sum(V3)), .(G1 = pmin(V1, V2), G2 = pmax(V1, V2))]

   G1 G2 Count
1:  A  B     3
2:  A  C     1

Data:

structure(list(V1 = c("A", "A", "A", "B"), V2 = c("B", "B", "C", 
"A"), V3 = c(1L, 1L, 1L, 1L)), .Names = c("V1", "V2", "V3"), row.names = c(NA, 
-4L), class = "data.frame")
Psidom
  • 209,562
  • 33
  • 339
  • 356