0

Python version: 2.7

Windows version: Windows 7 64-bit

Language of the system: Russian

I have a problem which has not been solved in the internet yet.

Here is my code:

 import textblob

 text = "I love people"

 text = TextBlob(text)
 print text.sentiment

I get the following error connected with the nltk method:

Traceback (most recent call last):
  File "C:\Users\Александр\Desktop\TextBlob.py", line 1, in <module>
    import textblob
  File "C:\Python27\lib\site-packages\textblob\__init__.py", line 9, in <module>
   from .blob import TextBlob, Word, Sentence, Blobber, WordList
   File "C:\Python27\lib\site-packages\textblob\blob.py", line 28, in <module>
    import nltk
  File "C:\Python27\lib\site-packages\nltk\__init__.py", line 128, in <module>
    from nltk.chunk import *
  File "C:\Python27\lib\site-packages\nltk\chunk\__init__.py", line 155, in <module>
   from nltk.data import load
  File "C:\Python27\lib\site-packages\nltk\data.py", line 77, in <module>
    if 'APPENGINE_RUNTIME' not in os.environ and os.path.expanduser('~/') != '~/':
  File "C:\Python27\lib\ntpath.py", line 311, in expanduser
    return userhome + path[i:]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc0 in position 9: ordinal not in range(128)

As far as I understood from answers from Google and Stackoverflow, the problem is related to language problems of ntpath.py.

I tried the following issues, and they did not work:

  1. Using sys.setdefaultencoding('utf8') How to fix: "UnicodeDecodeError: 'ascii' codec can't decode byte"

  2. Using sys.setdefaultencoding('Cp1252') It eliminated the error. However, the output of my programme disappeared too.

  3. Using import io. Python (nltk) - UnicodeDecodeError: 'ascii' codec can't decode byte

  4. Using unicode().decode() in ntpath.py (I do not remember a link where I found this solution).

UPD: I have found a solution.

I tried to insert this part into ntpath.py:

reload(sys)
sys.setdefaultencoding('Cp1252')

So, here is the part of the code in this file:

import os
import sys
import stat
import genericpath
import warnings

#another way
reload(sys)
sys.setdefaultencoding('Cp1252')

It works perfectly. If you have another language in your system settings, "play" with them and replace Cp1252.

Community
  • 1
  • 1
Alex
  • 265
  • 2
  • 15
  • This has nothing to do with NLTK, I think. The problem is that your path contains non-ASCII characters, which isn't handled properly. If you are new to Python, why aren't you working with Python 3? You will have much less trouble of this kind. – lenz Nov 01 '16 at 12:39
  • @lenz, I have tried to work in a 3.5 version, but I had a lot of troubles with compiling in exe files. 2.7 works pretty good with it. Can I somehow change the parameters of my system in order to avoid this problem? – Alex Nov 01 '16 at 14:30
  • Yes you can: Your username is "Александр", so `userhome` is probably `r"C:\Users\Александр"`. Create a new user named Alexander (or Aleksandr, or Donald), so that folder paths only contain ascii characters. – alexis Nov 01 '16 at 14:39
  • @alexis, thank you a lot but I found a better solution) check my UPD above. – Alex Nov 01 '16 at 14:50
  • 1
    You should post your solution as an answer, not as an edit to the question. – lenz Nov 01 '16 at 15:21
  • Before you write your own answer: Understand that `sys.setdefaultencoding()` is a global setting. It has no business in `ntpath.py`-- and the problem can arise in other places too. Put the two lines at the top of your own program, and they will still do the job. – alexis Nov 01 '16 at 15:41
  • @alexis, unfortunately, I put these strings into my code before and it did not work. Hence, putting into `ntpath.py` seems the only way. What other places can this problem arise in? – Alex Nov 01 '16 at 15:55
  • @lenz, done it. All right? – Alex Nov 01 '16 at 15:56

1 Answers1

1

I have found a solution.

I tried to insert this part into ntpath.py:

reload(sys)
sys.setdefaultencoding('Cp1252')

So, here is the part of the code in this file:

import os
import sys
import stat
import genericpath
import warnings

#another way
reload(sys)
sys.setdefaultencoding('Cp1252')

It works perfectly. If you have another language in your system settings, "play" with them and replace Cp1252.

Alex
  • 265
  • 2
  • 15