Sum column based on the strings in the other two columns, strings can be in either columns

Question

Inputs:

group1 <- c("A", "B", "C", "D")
group2 <- c("B", "A", "D", "C")
count <- c(1, 3, 2, 4)
df <- data.frame(group1, group2, count)

df:

  group1 group2 count
1      A      B     1
2      B      A     3
3      C      D     2
4      D      C     4

Desired output:

     group total
1 AB or BA     4
2 CD or DC     6

My actual dataset has a very long list of these group pairs.

related: https://stackoverflow.com/questions/73275992/count-number-of-occurances-of-pairs-of-values-that-are-in-different-order-in-dat — MrFlick, Nov 09 '22 at 15:53

score 1 · Answer 1 · answered Nov 09 '22 at 15:49

Sort the strings so the alphabetically first one is in a specific column:

library(dplyr)
df %>%
  mutate(g1 = pmin(group1, group2),
         g2 = pmax(group1, group2)) %>%
  group_by(g1, g2) %>%
  summarize(total = sum(count), .groups = "drop")
# # A tibble: 2 × 3
#   g1    g2    total
#   <chr> <chr> <dbl>
# 1 A     B         4
# 2 C     D         6

score 0 · Answer 2 · answered Nov 09 '22 at 16:02

This works if the pairs are always in adjacent rows:

library(tidyverse)   
df %>%
  mutate(
    # create row ID for each pair:
    id = (row_number() - 1) %/% 2,
    # create column `group`:
    group = str_c(group1, group2, " or ", group2, group1)) %>%
  group_by(id) %>%
  summarise(across(group, first),
            total = sum(count)) %>%
  select(-id)
# A tibble: 2 × 2
  group    total
  <chr>    <dbl>
1 AB or BA     4
2 CD or DC     6

Sum column based on the strings in the other two columns, strings can be in either columns

2 Answers2