Different behavior when opening a UTF-8 file python

Question

I am trying to read a file (https://nlp.stanford.edu/projects/glove/) which contains semantic vector representation of 42B tokens. I use the following code in my local PC environment:

import numpy as np
def loadGloveModel(gloveFile):
    print("Loading Glove Model")
    f = open(gloveFile,'r')
    model = {}
    print(f)
    for line in f:
        splitLine = line.split()
        word = splitLine[0]
        embedding = np.array([float(val) for val in splitLine[1:]])
        model[word] = embedding
    print("Done.",len(model)," words loaded!")
    return model

fileDir = './glove.42B.300d.txt'    
glove_model = loadGloveModel(fileDir)
print(len(glove_model['shih-tzu']))

And it works. I use jupyter notebook and python3. I think this file is encoded using 'utf-8'.

However, since the further calculations are heavy, I want to run the code on our server. First I tried to run the same code (by converting ipython notebook to .py file), but I faced this error:

UnicodeEncodeError: 'ascii' codec can't encode character '\xa3' in position 2: ordinal not in range(128)

Then I tried the following:

   f = open(gloveFile,'r', encoding = 'utf-8')

Again I received the same error

File "Spectral_dictionary_based-onlythoseintheglove.py", line 22, in loadGloveModel print(splitLine) UnicodeEncodeError: 'ascii' codec can't encode character '\xa3' in position 2: ordinal not in range(128)

I wonder why the same code runs correctly in one environment while is not working in another. Any ideas how to fix it?

Thanks in advance

It is your terminal that is speaking ASCII! Have Python write to a file instead. — MisterMiyagi, Dec 06 '19 at 07:31
What the others said, and: it must be the `print()` expression that causes the exception (probably the last one). The encoding of `sys.stdout` depends on the locale and other environment variables of the host machine. — lenz, Dec 06 '19 at 07:33
Oh, it even tells you in the traceback: the offensive line is `print(splitLine)` – a line that is missing from your example code. — lenz, Dec 06 '19 at 07:36
Thank you very much. I have realized the problem was for the output (print) and when I removed the line the problem was solved. Thanks a lot. — Kadaj13, Dec 06 '19 at 08:47

Different behavior when opening a UTF-8 file python

0 Answers0