I have a R dataframe that consists of two columns, id
and text
, and I want to turn it into a cooccurrence matrix of word pairs that appear together in the same id
's list of words.
So, this dataframe:
df <- data.frame(id = c(1, 1, 1, 2, 2, 2), text = c(but, the, and, but, a, the))
should be turned into something like this:
but | the | and | a | |
---|---|---|---|---|
but | 2 | 2 | 1 | 1 |
the | 2 | 2 | 1 | 1 |
and | 1 | 1 | 1 | 0 |
a | 1 | 1 | 0 | 1 |
But at larger scale. I think this toy example should be transferable though. I'm not sure where to even start here, but tidyverse solutions are preferred.