2

Thanks for any help! I have a dataframe in R with two columns of categorical variables, like so:

rowA <- c("Square", "Circle", "Triangle", "Square", "Circle", "Triangle", "Square", "Circle", "Triangle")

rowB <- c("Circle", "Square", "Square", "Square", "Circle", "Circle", "Triangle", "Triangle", "Triangle")

df1 <- data.frame(rowA, rowB)

print(df1)

When we print it, it looks like this:

      rowA     rowB
1   Square   Circle
2   Circle   Square
3 Triangle   Square
4   Square   Square
5   Circle   Circle
6 Triangle   Circle
7   Square Triangle
8   Circle Triangle
9 Triangle Triangle

I want to count the frequency of each combination of categories in rowA and rowB. Here's what I'm hung up on -- the combinations are reversible, meaning "Square - Circle" is the same as "Circle - Square" for our purposes, and we want them to be summed together. The ideal output would look like this:

Pair             Count
Square - Circle      2
Square - Triangle    2
Square - Square      1
Circle - Triangle    2
Circle - Circle      1
Triangle - Triangle  1

I'd be thrilled if anybody had any advice, thanks!

Edit: Post got flagged as a duplicate question, but I don't agree that the suggested posts adequately answered my question (hence I asked in the first place, after a lot of digging). Really appreciate the unique and easy answers here.

swr
  • 43
  • 4
  • 1
    [Pasting elements of two vectors alphabetically](https://stackoverflow.com/questions/25588426/pasting-elements-of-two-vectors-alphabetically), and then count (e.g. [Count number of rows within each group](https://stackoverflow.com/questions/9809166/count-number-of-rows-within-each-group) – Henrik Jul 05 '21 at 19:30
  • A couple more related posts: https://stackoverflow.com/q/15487151/5325862, https://stackoverflow.com/q/42144322/5325862, https://stackoverflow.com/q/51274241/5325862, https://stackoverflow.com/q/46536183/5325862 – camille Jul 06 '21 at 22:50
  • If you don't think the proposed duplicates answer your question you need to show us why they don't do that. Just saying you “don't agree” gives us nothing to go on even if you are correct. – Dour High Arch Jul 08 '21 at 02:27

3 Answers3

4

We could rearrrange by row with pmin/pmax and count

library(dplyr)
library(stringr)
df1 %>%
     count(Pair = str_c(pmin(rowA, rowB), ' - ',
       pmax(rowA, rowB)), name = "Count")

-output

             Pair   Count
1     Circle - Circle 1
2     Circle - Square 2
3   Circle - Triangle 2
4     Square - Square 1
5   Square - Triangle 2
6 Triangle - Triangle 1
akrun
  • 874,273
  • 37
  • 540
  • 662
3

A base R solution is

combs <- apply(as.matrix(df1), 1, function(x) paste0(sort(x), collapse = " - "))
as.data.frame(table(combs))
#R>                 combs Freq
#R> 1     Circle - Circle    1
#R> 2     Circle - Square    2
#R> 3   Circle - Triangle    2
#R> 4     Square - Square    1
#R> 5   Square - Triangle    2
#R> 6 Triangle - Triangle    1

# in R 4.1.0 or later
as.matrix(df1) |> 
  apply(1, \(x) paste0(sort(x), collapse = " - ")) |>
  table() |> as.data.frame() |> 
  setNames(c("Pair", "Count"))
#R>                  Pair Count
#R> 1     Circle - Circle     1
#R> 2     Circle - Square     2
#R> 3   Circle - Triangle     2
#R> 4     Square - Square     1
#R> 5   Square - Triangle     2
#R> 6 Triangle - Triangle     1
1

another approach using a graph

library(igraph)
library(magrittr)
df1 %>% 
  graph_from_data_frame(directed = FALSE) %>%
  as_adjacency_matrix() 
#          Square Circle Triangle
# Square        1      2        2
# Circle        2      1        2
# Triangle      2      2        1
Wimpel
  • 26,031
  • 1
  • 20
  • 37