Generate matrix of context or simple word vector embedding in Python

Asked Aug 16 '17 at 22:04

Active Aug 17 '17 at 14:09

Viewed 336 times

I have two sentences: Skies are blue. Grass is green

I would like to compute simple matrix of word vector space embedding or matrix of co-occurrences, I am not sure what proper terminology is. But here is that I want. So I have 6 distinct words from two sentences above, so my matrix will be 6 by 6. Assume that my words have the following ordering corresponding to rows or column ordering: 0 - Skies, 1 - are, 2 - blue, 3 - Grass, 4 - is, 5 - green. Then I would like to count co-occurrence using size of window = 2 (meaning 2 words prior to current word and 2 words after current word).

Element with index [0,0] will have value 0, since Skies do not co-occur with Skies.
Element with index [0,1] will have value, since are occurs next to Skies only once
Element with index [0,2] will have value, since blue occurs next to Skies only once.

So on and so forth. Is there scikit module for it? I looked at the following question , but it does not seem to answer my question.

Update This matrix is key object of distributional hypothesis.

edited Aug 17 '17 at 14:09

asked Aug 16 '17 at 22:04

user1700890

7,144
18
87
183

Generate matrix of context or simple word vector embedding in Python

0 Answers0