0

I am trying to read a file (https://nlp.stanford.edu/projects/glove/) which contains semantic vector representation of 42B tokens. I use the following code in my local PC environment:

import numpy as np
def loadGloveModel(gloveFile):
    print("Loading Glove Model")
    f = open(gloveFile,'r')
    model = {}
    print(f)
    for line in f:
        splitLine = line.split()
        word = splitLine[0]
        embedding = np.array([float(val) for val in splitLine[1:]])
        model[word] = embedding
    print("Done.",len(model)," words loaded!")
    return model

fileDir = './glove.42B.300d.txt'    
glove_model = loadGloveModel(fileDir)
print(len(glove_model['shih-tzu']))

And it works. I use jupyter notebook and python3. I think this file is encoded using 'utf-8'.

However, since the further calculations are heavy, I want to run the code on our server. First I tried to run the same code (by converting ipython notebook to .py file), but I faced this error:

UnicodeEncodeError: 'ascii' codec can't encode character '\xa3' in position 2: ordinal not in range(128)

Then I tried the following:

   f = open(gloveFile,'r', encoding = 'utf-8')

Again I received the same error

File "Spectral_dictionary_based-onlythoseintheglove.py", line 22, in loadGloveModel print(splitLine) UnicodeEncodeError: 'ascii' codec can't encode character '\xa3' in position 2: ordinal not in range(128)

I wonder why the same code runs correctly in one environment while is not working in another. Any ideas how to fix it?

Thanks in advance

Kadaj13
  • 1,423
  • 3
  • 17
  • 41
  • 2
    What version of Python are you running on the server? – Kent Shikama Dec 06 '19 at 07:26
  • 2
    The error occurs on *output*, not on input…! – deceze Dec 06 '19 at 07:29
  • 1
    It is your terminal that is speaking ASCII! Have Python write to a file instead. – MisterMiyagi Dec 06 '19 at 07:31
  • 1
    What the others said, and: it must be the `print()` expression that causes the exception (probably the last one). The encoding of `sys.stdout` depends on the locale and other environment variables of the host machine. – lenz Dec 06 '19 at 07:33
  • 1
    Oh, it even tells you in the traceback: the offensive line is `print(splitLine)` – a line that is missing from your example code. – lenz Dec 06 '19 at 07:36
  • Thank you very much. I have realized the problem was for the output (print) and when I removed the line the problem was solved. Thanks a lot. – Kadaj13 Dec 06 '19 at 08:47

0 Answers0