How can I count the number of times one item has been grouped together with another in R?

Question

I have a table (example):

 Group  |  Country
-------------------
 Group1      SE
 Group1      DE  
 Group2      SE   
 Group2      DE
 Group2      FI
 Group3      SE
 Group3      FI

I'm trying to transform it to get:

 Country 1 | Country 2 | Count
-------------------------------
    SE          DE         2
    SE          FI         2
    FI          DE         1

I've tried using dplyr's count, group_by, summarise but I cannot seem to get my head around it. I instead got a table with each country as columns, each group as rows and a 1 or 0 in the cell if the country was in the group or not.

Related: [Table of Interactions - Case with pets and houses](https://stackoverflow.com/questions/19891278/table-of-interactions-case-with-pets-and-houses); [Match and Count the Data Matrix in R](https://stackoverflow.com/questions/23660789/match-and-count-the-data-matrix-in-r) — Henrik, Mar 05 '21 at 06:36

score 3 · Accepted Answer · answered Jul 06 '18 at 08:42

We could use base R methods, use table to get the frequency, do a crossprod, set the diagonal and lower triangle elements to NA and remove the NA rows after converting to data.frame

m1 <- crossprod(table(df1))
m1[lower.tri(m1, diag = TRUE)] <- NA
subset(as.data.frame.table(m1), !is.na(Freq))
#    Country Country.1 Freq
#4      DE        FI    1
#7      DE        SE    2
#8      FI        SE    2

data

df1 <- structure(list(Group = c("Group1", "Group1", "Group2", "Group2", 
"Group2", "Group3", "Group3"), Country = c("SE", "DE", "SE", 
"DE", "FI", "SE", "FI")), .Names = c("Group", "Country"),
 class = "data.frame", row.names = c(NA, -7L))

score 1 · Answer 2 · answered Jul 06 '18 at 08:51

Here is a alternative tidyverse approach using combn

library(tidyverse)
df %>%
    group_by(Group) %>%
    summarise(cmbn = list(apply(combn(Country, 2), 2, function(x)
        paste(sort(x), collapse = "_")))) %>%
    unnest() %>%
    select(-Group) %>%
    separate(cmbn, into = c("Country 1", "Country 2"), sep = "_") %>%
    count(`Country 1`, `Country 2`)
## A tibble: 3 x 3
#  `Country 1` `Country 2`     n
#  <chr>       <chr>       <int>
#1 DE          FI              1
#2 DE          SE              2
#3 FI          SE              2

Sample data

df <- read.table(text =
    "Group    Country
 Group1      SE
 Group1      DE
 Group2      SE
 Group2      DE
 Group2      FI
 Group3      SE
 Group3      FI", header = T, stringsAsFactors = F)

You don't need the `apply` as `combn` has the `FUN` argument i.e. `df %>% group_by(Group) %>% summarise(cmbn = list(combn(Country, 2, FUN = function(x) paste(sort(x), collapse="_"))))` — akrun, Jul 07 '18 at 03:33

score 1 · Answer 3 · answered Jul 06 '18 at 09:34

An alternative dplyr approach that applies a function to each Country value combination

df = read.table(text = "
Group Country
Group1      SE
Group1      DE  
Group2      SE   
Group2      DE
Group2      FI
Group3      SE
Group3      FI
", header=T, stringsAsFactors=F)

library(dplyr)

# function that takes 2 Country values and returns the number of common groups they have
f = function(x,y) { 
  df %>% 
    filter(Country %in% c(x,y)) %>% 
    distinct() %>%
    count(Group) %>%
    filter(n > 1) %>%
    nrow() 
}

# vectorising the function
f = Vectorize(f)

# applying the function to each Country value combination
data.frame(t(combn(unique(df$Country), 2)), stringsAsFactors = F) %>%
  mutate(NumGroups = f(X1, X2))

#   X1 X2 NumGroups
# 1 SE DE         2
# 2 SE FI         2
# 3 DE FI         1

How can I count the number of times one item has been grouped together with another in R?

3 Answers3

data

Sample data