0

I'm a rookie in R and I want to create a co-occurrence matrix based on which elements co-occur in a row.

Basic Example of Aspired Outcome

Say you have this table:

df <- data.frame(ID = c(1,2,3), 
                 V1 = c("England", "England", "China"),
                 V2 = c("Greece", "England", "Greece")
)

That is:

   ID      V1      V2
1:  1 England  Greece
2:  2 England England
3:  3   China  Greece

My co-occurrence matrix should look like this:

Country   China    England  Greece
China           0       0      1 #China & Greece co-occur in row 3
England         0       1      1 #England & England co-occur in row 2, and England and Greece in row 1
Greece          1       1      0 

However, if I follow this example I get this:

library(tidyverse)
df %>%
      pivot_longer(-ID, names_to = "Category", values_to = "Country") %>%
      xtabs(~ID + Country, data = ., sparse = FALSE) %>% 
      crossprod(., .) 
    
    df_diag <- df %>% 
      pivot_longer(-ID, names_to = "Category", values_to = "Country") %>%
      mutate(Country2 = Country) %>%
      xtabs(~Country + Country2, data = ., sparse = FALSE) %>% 
      diag()
    
    diag(df1) <- df_diag 
    
    df1

Country   China England Greece
  China       1       0      1
  England     0       3      1
  Greece      1       1      2

And if I follow this other one I get:

library(reshape2)
library(data.table)

melt(setDT(df), id.vars = "ID", measure = patterns("^V"))[nchar(value) > 0 & complete.cases(value)] -> foo

# Get distinct value (country) in each ID group (each row)
unique(foo, by = c("ID", "value")) -> foo2

# https://stackoverflow.com/questions/13281303/creating-co-occurrence-matrix
# Seeing this question, you want to create a matrix with crossprod().

crossprod(table(foo2[, c(1,3)])) -> mymat

# Finally, you need to change diagonal values. If a value is equal to one,
# change it to zero. Otherwise, keep the original value.

diag(mymat) <- ifelse(diag(mymat) <= 1, 0, mymat)

mymat

value     China England Greece
  China       0       0      1
  England     0       0      1
  Greece      1       1      1

I guess I'm missing something about how xtabs works or perhaps also crosspod?

silviaegt
  • 317
  • 3
  • 12
  • Possible duplicate: https://stackoverflow.com/questions/16584948/how-to-create-weighted-adjacency-list-matrix-from-edge-list – MrFlick Aug 10 '20 at 23:36
  • I think the tricky part is just making sure your V1 and V2 columns have the same factor levels for counts, but basically just `df %>% mutate(across(c(V1,V2), ~factor(.x, levels=sort(unique(c(V1,V2)))))) %>% xtabs(~V1+V2, .)` should work. – MrFlick Aug 10 '20 at 23:44

2 Answers2

2

What you need is:

In base R you could do:

a <- table(lapply(df[-1], factor, levels = sort(unique(unlist(df[-1])))))
a[lower.tri(a)] <- t(a)[lower.tri(a)]
 a
         V2
V1        China England Greece
  China       0       0      1
  England     0       1      1
  Greece      1       1      0

Other libraries:

library(network)
as.matrix(network(df[-1],directed = FALSE))
        China England Greece
China       0       0      1
England     0       1      1
Greece      1       1      0

library(igraph)
as.matrix(as_adj(graph_from_data_frame(df[-1], FALSE)))
        England China Greece
England       1     0      1
China         0     0      1
Greece        1     1      0
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

You can use outer in base R :

unique_vals <- sort(union(df$V1, df$V2))
co_mat <- function(x, y) +(any(df$V1 == x & df$V2 == y | 
                                df$V2 == x & df$V1 == y))
mat <- outer(unique_vals, unique_vals, Vectorize(co_mat))
dimnames(mat) <- list(unique_vals, unique_vals)
mat

#        China England Greece
#China       0       0      1
#England     0       1      1
#Greece      1       1      0
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213