1

When I run nltk.corpus.gutenberg.fileids() with Python 2.7 (Anaconda, Windows) I get the following error:

File "C:\Anaconda\lib\ntpath.py", line 85, in join
    result_path = result_path + '\\'

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 9:
ordinal not in range(128)

I don't have this error when I use Python 3.4. Maybe I'm wrong but I suspect the path to contain an accent (as there is an accent in my Windows username).

When I add some print in ntpath.py, nothing is printed I don't know why (?) so I'm unable to debug by myself.

EDIT: The import nltk is enough to get the error.

clemtoy
  • 1,681
  • 2
  • 18
  • 30

1 Answers1

1

I'm guessing Python 2 nltk has some issues with non-ASCII paths. Using Python 3 is probably the simplest fix here, at least assuming you don't have too much code that doesn't work in it. It's hard to say for sure, since you didn't include the full traceback, but likely nltk would have to be patched to fix this for Python 2. Otherwise, you would need to avoid paths with non-ASCII characters (meaning avoiding your user directory or changing your username).

asmeurer
  • 86,894
  • 26
  • 169
  • 240