I'm a rookie in R and I want to create a co-occurrence matrix based on which elements co-occur in a row.
Basic Example of Aspired Outcome
Say you have this table:
df <- data.frame(ID = c(1,2,3),
V1 = c("England", "England", "China"),
V2 = c("Greece", "England", "Greece")
)
That is:
ID V1 V2
1: 1 England Greece
2: 2 England England
3: 3 China Greece
My co-occurrence matrix should look like this:
Country China England Greece
China 0 0 1 #China & Greece co-occur in row 3
England 0 1 1 #England & England co-occur in row 2, and England and Greece in row 1
Greece 1 1 0
However, if I follow this example I get this:
library(tidyverse)
df %>%
pivot_longer(-ID, names_to = "Category", values_to = "Country") %>%
xtabs(~ID + Country, data = ., sparse = FALSE) %>%
crossprod(., .)
df_diag <- df %>%
pivot_longer(-ID, names_to = "Category", values_to = "Country") %>%
mutate(Country2 = Country) %>%
xtabs(~Country + Country2, data = ., sparse = FALSE) %>%
diag()
diag(df1) <- df_diag
df1
Country China England Greece
China 1 0 1
England 0 3 1
Greece 1 1 2
And if I follow this other one I get:
library(reshape2)
library(data.table)
melt(setDT(df), id.vars = "ID", measure = patterns("^V"))[nchar(value) > 0 & complete.cases(value)] -> foo
# Get distinct value (country) in each ID group (each row)
unique(foo, by = c("ID", "value")) -> foo2
# https://stackoverflow.com/questions/13281303/creating-co-occurrence-matrix
# Seeing this question, you want to create a matrix with crossprod().
crossprod(table(foo2[, c(1,3)])) -> mymat
# Finally, you need to change diagonal values. If a value is equal to one,
# change it to zero. Otherwise, keep the original value.
diag(mymat) <- ifelse(diag(mymat) <= 1, 0, mymat)
mymat
value China England Greece
China 0 0 1
England 0 0 1
Greece 1 1 1
I guess I'm missing something about how xtabs works or perhaps also crosspod?