I'm working on an NLP task and I need to calculate the co-occurrence matrix over documents. The basic formulation is as below:
Here I have a matrix with shape (n, length)
, where each row represents a sentence composed by length
words. So there are n
sentences with same length in all. Then with a defined context size, e.g., window_size = 5
, I want to calculate the co-occurrence matrix D
, where the entry in the cth
row and wth
column is #(w,c)
, which means the number of times that a context word c
appears in w
's context.
An example can be referred here. How to calculate the co-occurrence between two words in a window of text?
I know it can be calculate by stacking loops, but I want to know if there exits an simple way or simple function? I have find some answers but they cannot work with a window sliding through the sentence. For example:word-word co-occurrence matrix
So could anyone tell me is there any function in Python can deal with this problem concisely? Cause I think this task is quite common in NLP things.