1

I have two sentences: Skies are blue. Grass is green

I would like to compute simple matrix of word vector space embedding or matrix of co-occurrences, I am not sure what proper terminology is. But here is that I want. So I have 6 distinct words from two sentences above, so my matrix will be 6 by 6. Assume that my words have the following ordering corresponding to rows or column ordering: 0 - Skies, 1 - are, 2 - blue, 3 - Grass, 4 - is, 5 - green. Then I would like to count co-occurrence using size of window = 2 (meaning 2 words prior to current word and 2 words after current word).

  • Element with index [0,0] will have value 0, since Skies do not co-occur with Skies.
  • Element with index [0,1] will have value, since are occurs next to Skies only once
  • Element with index [0,2] will have value, since blue occurs next to Skies only once.

So on and so forth. Is there scikit module for it? I looked at the following question , but it does not seem to answer my question.

Update This matrix is key object of distributional hypothesis.

user1700890
  • 7,144
  • 18
  • 87
  • 183

0 Answers0