I am trying to read a file (https://nlp.stanford.edu/projects/glove/) which contains semantic vector representation of 42B tokens. I use the following code in my local PC environment:
import numpy as np
def loadGloveModel(gloveFile):
print("Loading Glove Model")
f = open(gloveFile,'r')
model = {}
print(f)
for line in f:
splitLine = line.split()
word = splitLine[0]
embedding = np.array([float(val) for val in splitLine[1:]])
model[word] = embedding
print("Done.",len(model)," words loaded!")
return model
fileDir = './glove.42B.300d.txt'
glove_model = loadGloveModel(fileDir)
print(len(glove_model['shih-tzu']))
And it works. I use jupyter notebook and python3. I think this file is encoded using 'utf-8'.
However, since the further calculations are heavy, I want to run the code on our server. First I tried to run the same code (by converting ipython notebook to .py file), but I faced this error:
UnicodeEncodeError: 'ascii' codec can't encode character '\xa3' in position 2: ordinal not in range(128)
Then I tried the following:
f = open(gloveFile,'r', encoding = 'utf-8')
Again I received the same error
File "Spectral_dictionary_based-onlythoseintheglove.py", line 22, in loadGloveModel print(splitLine) UnicodeEncodeError: 'ascii' codec can't encode character '\xa3' in position 2: ordinal not in range(128)
I wonder why the same code runs correctly in one environment while is not working in another. Any ideas how to fix it?
Thanks in advance