R Calculate sum of values by unique column PAIRS (B-A and A-B) while keeping both pairs

Question

I'm dealing with the following issue. I would like to sum the count by Date, and unique pair of ID1 and ID2, meaning that A-B and B-A are ONE pair. However, I want to keep both pairs and their sum in my dataset.

My Dataset looks like this:

Date ID1 ID2 Count
12-1   A   B   1
12-1   B   A   1
12-1   D   E   1
12-1   E   D   2
12-2   Y   Z   2
12-2   Z   Y   3

An expected output looks like this:

Date ID1 ID2 SUM
12-1   A   B   2
12-1   B   A   2
12-1   D   E   3
12-1   E   D   3
12-2   Y   Z   5
12-2   Z   Y   5

My Question can be seen as an extension of this previous question:

R sum observations by unique column PAIRS (B-A and A-B) and NOT unique combinations (B-A or A-B)

Many thanks in advance.

score 3 · Answer 1 · answered Dec 04 '21 at 16:49

3

Here is a way.
First, create a vector of sorted values in the columns ID1 and ID2, and paste them together. Then group with ave. Finally, remove the vector of unique values.

df1$unique <- apply(df1[c("ID1", "ID2")], 1, \(x) paste(sort(x), collapse = ""))
df1$Sum <- with(df1, ave(Count, unique, FUN = sum))
df1$unique <- NULL
df1
#  Date ID1 ID2 Count Sum
#1 12-1   A   B     1   2
#2 12-1   B   A     1   2
#3 12-1   D   E     1   3
#4 12-1   E   D     2   3
#5 12-2   Y   Z     2   5
#6 12-2   Z   Y     3   5

answered Dec 04 '21 at 16:49

Rui Barradas

70,273
8
34
66

I tried to use it on my dataset. ID1 and ID2 are actually country names. I get this error: Error: unexpected input in "apply(df1[c("ID1", "ID2")], 1, \" – MixedModeler Dec 04 '21 at 17:15
1

@MixedModeler `\(x)` are the new lambda functions, introduced in R 4.1.0. Try `function(x)` instead. Time to update R? – Rui Barradas Dec 04 '21 at 18:30
@ Rui Barradas Right, thanks! – MixedModeler Dec 07 '21 at 21:19

akrun · Accepted Answer · 2021-12-04T17:22:29.357

2

This may also be done with pmin/pmax to create a grouping column

library(dplyr)
library(stringr)
df1 %>% 
   group_by(Date, grp = str_c(pmin(ID1, ID2), pmax(ID1, ID2))) %>% 
   mutate(Sum = sum(Count)) %>%
   ungroup %>%
   select(-grp)

-output

# A tibble: 6 × 5
  Date  ID1   ID2   Count   Sum
  <chr> <chr> <chr> <int> <int>
1 12-1  A     B         1     2
2 12-1  B     A         1     2
3 12-1  D     E         1     3
4 12-1  E     D         2     3
5 12-2  Y     Z         2     5
6 12-2  Z     Y         3     5

data

df1 <- structure(list(Date = c("12-1", "12-1", "12-1", "12-1", "12-2", 
"12-2"), ID1 = c("A", "B", "D", "E", "Y", "Z"), ID2 = c("B", 
"A", "E", "D", "Z", "Y"), Count = c(1L, 1L, 1L, 2L, 2L, 3L)),
 class = "data.frame", row.names = c(NA, 
-6L))

edited Dec 04 '21 at 17:22

answered Dec 04 '21 at 17:16

akrun

874,273
37
540
662

Did you consider date as well? – MixedModeler Dec 04 '21 at 17:21
In this example, even adding the 'Date' doesn't change the outcome. Updated – akrun Dec 04 '21 at 17:22
1

Right, I should have stressed it more in the example. – MixedModeler Dec 04 '21 at 17:34
2

@akrun This one is much better! – TarJae Dec 04 '21 at 18:54

TarJae · Answer 3 · 2021-12-04T17:10:33.120

Here is a dplyr solution making use of lapply:

In essence we create a new column y that orders the characters in alphabetically order, so that we can group also for this column:

library(dplyr)
library(stringr)

df %>% 
  mutate(x = paste(ID1, ID2)) %>% 
  mutate(y = str_split(x, ' ') %>% lapply(., 'sort') %>%  lapply(., 'paste', collapse=' ')) %>% 
  group_by(Date, y) %>% 
  mutate(SUM = sum(Count)) %>% 
  ungroup() %>% 
  select(-c(x, y, Count))

  Date  ID1   ID2     SUM
  <chr> <chr> <chr> <int>
1 12-1  A     B         2
2 12-1  B     A         2
3 12-1  D     E         3
4 12-1  E     D         3
5 12-2  Y     Z         5
6 12-2  Z     Y         5

R Calculate sum of values by unique column PAIRS (B-A and A-B) while keeping both pairs

3 Answers3

data