3

I have a seemingly simple calculation, where I have a data frame composed of 4 columns as shown below (Date, Origin, Destination, count). I would like to sum the count by Date, and unique pair of ID1 and ID2, meaning that A-B and B-A are ONE pair.

Date ID1 ID2 Count
12-1   A   B   1
12-1   B   A   1
12-1   D   E   1
12-1   E   D   2
12-1   Y   Z   2
12-2   A   B   1
12-2   B   A   1
12-2   D   E   1
12-2   E   D   2
12-2   Y   Z   2

From the date set we can either sum the "Count" column by unique combinations (e.g. A-B, B-A, D-E, E-D, etc.). However I would like to sum the count column by unique pair- meaning that A-B would also include B-A.

Ideally I would like to have the table summarized by Date and by unique pair.

I looked through everything, dyplyr functions such mutate, gather, unite- all to no avail.

Would be deeply grateful for any insight or pointing the right direction. Many thanks

Roberto
  • 181
  • 8

1 Answers1

3

We can use pmin, pmax to do the sorting by row, use that as grouping variables along with 'Date' and get the sum of 'Count'

library(dplyr)
df1 %>% 
  group_by(Date, ID1n = pmin(ID1, ID2), ID2n = pmax(ID1, ID2)) %>% 
  summarise(Count = sum(Count)) %>%
  #dplyr::select(ID1 = ID1n, ID2 = ID2n, Date, Count)
  rename(ID1 = ID1n, ID2 = ID2n)
# A tibble: 6 x 4
# Groups:   ID1, Date [6]
#  ID1   ID2   Date  Count
#  <chr> <chr> <chr> <int>
#1 A     B     12-1      2
#2 D     E     12-1      3
#3 Y     Z     12-1      2
#4 A     B     12-2      2
#5 D     E     12-2      3
#6 Y     Z     12-2      2

Or if select or rename from dplyr is buggy, then stop at the summarise step and just assign the column names

out <-  df1 %>% 
  group_by(Date, ID1n = pmin(ID1, ID2), ID2n = pmax(ID1, ID2)) %>% 
  summarise(Count = sum(Count)) 
names(out)[1:2] <- c("ID1", "ID2")

Or using base R, sort by rows for the columns 'ID1', 'ID2' and get a sum of 'Count' by the other variables

df1[c('ID1', 'ID2')] <- t(apply(df1[c('ID1', 'ID2')], 1, sort))
aggregate(Count ~ ., df1, sum)
#  Date ID1 ID2 Count
#1 12-1   A   B     2
#2 12-2   A   B     2
#3 12-1   D   E     3
#4 12-2   D   E     3
#5 12-1   Y   Z     2
#6 12-2   Y   Z     2

data

df1 <- structure(list(Date = c("12-1", "12-1", "12-1", "12-1", "12-1", 
"12-2", "12-2", "12-2", "12-2", "12-2"), ID1 = c("A", "B", "D", 
"E", "Y", "A", "B", "D", "E", "Y"), ID2 = c("B", "A", "E", "D", 
"Z", "B", "A", "E", "D", "Z"), Count = c(1L, 1L, 1L, 2L, 2L, 
1L, 1L, 1L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
-10L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/210564/discussion-on-answer-by-akrun-r-sum-observations-by-unique-column-pairs-b-a-and). – Samuel Liew Mar 30 '20 at 07:00