How to turn two column dataframe into a word cooccurrence matrix?

Question

I have a R dataframe that consists of two columns, id and text, and I want to turn it into a cooccurrence matrix of word pairs that appear together in the same id's list of words. So, this dataframe:

df <- data.frame(id = c(1, 1, 1, 2, 2, 2), text = c(but, the, and, but, a, the))

should be turned into something like this:

	but	the	and	a
but	2	2	1	1
the	2	2	1	1
and	1	1	1	0
a	1	1	0	1

But at larger scale. I think this toy example should be transferable though. I'm not sure where to even start here, but tidyverse solutions are preferred.

score 0 · Accepted Answer · answered Nov 19 '22 at 04:55

0

Following this answer:

dat <- crossprod(table(df))

answered Nov 19 '22 at 04:55

nlplearner

115
1
10

How to turn two column dataframe into a word cooccurrence matrix?

1 Answers1