2

I'm trying to create a similarity between two words using word2vec, I was successful, while doing it manually. but I have two big txt files. I want to create a loop. I tried a couple methods for looping but I was unsuccessful. so I decided to ask expert.

my code :

import gensim

model = gensim.models.Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)
with open('myfile1.txt', 'r') as f:
    data1 = f.readlines()

with open('myfile2.txt', 'r') as f:
    data2 = f.readlines()

data = zip(data1, data2)

with open('myoutput.txt', 'a') as f:
    for x in data: 
        output = model.similarity(x[1], x[0])  # reading each word form each files
        out = '{} : {} : {}\n'.format(x[0].strip(), x[1].strip(),output)  
        f.write(out)

my input1, (text1)

street 
spain 
ice
man

my input2 (text2)

florist
paris 
cold 
kid

I want this output (output.txt)

street florist 0.19991447551502498
spain paris 0.5380033328157873
ice cold 0.40968857572410483
man kid  0.42953233870042506
Lona gracia
  • 105
  • 3
  • 12
  • please fix indentation & errors. – Jean-François Fabre Dec 25 '16 at 16:26
  • i have checked your code, it is working! what is the problem you are facing? are you getting any error? – Wasi Ahmad Dec 26 '16 at 06:41
  • I got this error : File "testing1.py", line 14, in output = model.similarity(x[1], x[0]) # reading each word form each files File "/anaconda2/lib/python2.7/site-packages/gensim-0.13.3-py2.7-linux-x86_64.egg/gensim/models/word2vec.py", line 1598, in similarity return dot(matutils.unitvec(self[w1]), matutils.unitvec(self[w2])) File "anaconda2/lib/python2.7/site-packages/gensim-0.13.3-py2.7-linux-x86_64.egg/gensim/models/word2vec.py", line 1578, in __getitem__ return self.syn0[self.vocab[words].index] KeyError: 'street \n' – Lona gracia Dec 26 '16 at 15:19
  • Possible duplicate of http://stackoverflow.com/questions/533905/get-the-cartesian-product-of-a-series-of-lists-in-python – alvas Dec 27 '16 at 08:11

1 Answers1

0
import gensim

model = gensim.models.Word2Vec.load_word2vec_format('GoogleNews-vectors-negative300.bin', binary=True)

 file1 = []
 file2 = []

 with open('myfile1.txt','rU') as f:
 for line in f:
 file1.append(line.rstrip())

 with open('myfile2.txt','rU') as f1:
 for line1 in f1:   
 file2.append(line1.rstrip())

 resutl=[]
 f=open('Output2.txt', "w") 
 for i in file1  :
 for g  in file2 :
        temp=[]
        temp.append(i)
        temp.append(g)
        w = model.similarity(i,g)
        temp.append(w)
        result=i+','+g+','+str(w)

        f.write(result)
        f.write('\n')

        f.close()

You had problem with the loop, the two loops should be together.

Fuji
  • 78
  • 8