Count combinations of categorical variables, regardless of order, in R?

Question

Thanks for any help! I have a dataframe in R with two columns of categorical variables, like so:

rowA <- c("Square", "Circle", "Triangle", "Square", "Circle", "Triangle", "Square", "Circle", "Triangle")

rowB <- c("Circle", "Square", "Square", "Square", "Circle", "Circle", "Triangle", "Triangle", "Triangle")

df1 <- data.frame(rowA, rowB)

print(df1)

When we print it, it looks like this:

      rowA     rowB
1   Square   Circle
2   Circle   Square
3 Triangle   Square
4   Square   Square
5   Circle   Circle
6 Triangle   Circle
7   Square Triangle
8   Circle Triangle
9 Triangle Triangle

I want to count the frequency of each combination of categories in rowA and rowB. Here's what I'm hung up on -- the combinations are reversible, meaning "Square - Circle" is the same as "Circle - Square" for our purposes, and we want them to be summed together. The ideal output would look like this:

Pair             Count
Square - Circle      2
Square - Triangle    2
Square - Square      1
Circle - Triangle    2
Circle - Circle      1
Triangle - Triangle  1

I'd be thrilled if anybody had any advice, thanks!

Edit: Post got flagged as a duplicate question, but I don't agree that the suggested posts adequately answered my question (hence I asked in the first place, after a lot of digging). Really appreciate the unique and easy answers here.

[Pasting elements of two vectors alphabetically](https://stackoverflow.com/questions/25588426/pasting-elements-of-two-vectors-alphabetically), and then count (e.g. [Count number of rows within each group](https://stackoverflow.com/questions/9809166/count-number-of-rows-within-each-group) — Henrik, Jul 05 '21 at 19:30
A couple more related posts: https://stackoverflow.com/q/15487151/5325862, https://stackoverflow.com/q/42144322/5325862, https://stackoverflow.com/q/51274241/5325862, https://stackoverflow.com/q/46536183/5325862 — camille, Jul 06 '21 at 22:50
If you don't think the proposed duplicates answer your question you need to show us why they don't do that. Just saying you “don't agree” gives us nothing to go on even if you are correct. — Dour High Arch, Jul 08 '21 at 02:27

score 4 · Accepted Answer · answered Jul 05 '21 at 18:28

We could rearrrange by row with pmin/pmax and count

library(dplyr)
library(stringr)
df1 %>%
     count(Pair = str_c(pmin(rowA, rowB), ' - ',
       pmax(rowA, rowB)), name = "Count")

-output

             Pair   Count
1     Circle - Circle 1
2     Circle - Square 2
3   Circle - Triangle 2
4     Square - Square 1
5   Square - Triangle 2
6 Triangle - Triangle 1

score 3 · Answer 2 · answered Jul 05 '21 at 19:13

A base R solution is

combs <- apply(as.matrix(df1), 1, function(x) paste0(sort(x), collapse = " - "))
as.data.frame(table(combs))
#R>                 combs Freq
#R> 1     Circle - Circle    1
#R> 2     Circle - Square    2
#R> 3   Circle - Triangle    2
#R> 4     Square - Square    1
#R> 5   Square - Triangle    2
#R> 6 Triangle - Triangle    1

# in R 4.1.0 or later
as.matrix(df1) |> 
  apply(1, \(x) paste0(sort(x), collapse = " - ")) |>
  table() |> as.data.frame() |> 
  setNames(c("Pair", "Count"))
#R>                  Pair Count
#R> 1     Circle - Circle     1
#R> 2     Circle - Square     2
#R> 3   Circle - Triangle     2
#R> 4     Square - Square     1
#R> 5   Square - Triangle     2
#R> 6 Triangle - Triangle     1

score 1 · Answer 3 · answered Jul 05 '21 at 18:58

another approach using a graph

library(igraph)
library(magrittr)
df1 %>% 
  graph_from_data_frame(directed = FALSE) %>%
  as_adjacency_matrix() 
#          Square Circle Triangle
# Square        1      2        2
# Circle        2      1        2
# Triangle      2      2        1

Count combinations of categorical variables, regardless of order, in R?

3 Answers3

Linked