-1

I'm not able to execute the below lines, the error is

"UnicodeDecodeError: 'ascii' codec can't decode byte 0xcb in position 0: ordinal not in range(128)"

File "D:\Py 64\ anaconda\lib\site-packages\nltk\tag__init__.py", line 100, in pos_tag tagger = load(_POS_TAGGER)

File "D:\Py 64\ anaconda\lib\site-packages\nltk\data.py", line 779, in load resource_val = pickle.load(opened_resource, encoding='iso-8859-1')

My error is not just in data.py, but also in init.py.

Note:- I have changed the code in data.py, line 779 as mentioned here


text = word_tokenize("They refuse to permit us to obtain the refuse permit")

nltk.pos_tag(text)
Community
  • 1
  • 1
  • possible duplicate of [NLTK 3 POS\_TAG throws UnicodeDecodeError](http://stackoverflow.com/questions/25590089/nltk-3-pos-tag-throws-unicodedecodeerror) – ham-sandwich Jun 26 '15 at 17:51
  • Your code runs without a UnicodeDecodeError using Python3.4, nltk 3.0.3, and the latest `maxent_treebank_pos_tagger` model. – unutbu Jun 26 '15 at 18:13
  • @HappyLeapSecond Can you tell me how can i install them or use them in my code? – Ashfaq Ahmed Jun 26 '15 at 18:28

1 Answers1

0

I believe this problem is fixed using nltk 3.0.3 and the lastest maxent_treebank_pos_tagger model.

To install nltk, use

pip install -U nltk

Make sure the pip you are calling is for Python3.

Once nltk is installed, open the Python3 interpreter, type:

>>> import nltk
>>> nltk.download()

and use the GUI to install maxent_treebank_pos_tagger. It's located under the models tab:

models > maxent_treebank_pos_tagger
Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Hey, thanks for your answer, but i'm getting an error while calling "https" (from the pip link you gave) while loading pip for python3. – Ashfaq Ahmed Jun 27 '15 at 08:49
  • Back up a second. What OS and what version of Python3 are you using? Do you know how it was installed? There might already be an associated `pip` installed on your system. – unutbu Jun 27 '15 at 08:56
  • Mine is Windows7 64 bit. I have got all packages of NLTK, since im using anaconda. – Ashfaq Ahmed Jun 27 '15 at 09:12
  • I would try installing `maxent_treebank_pos_tagger` first with your current version of NLTK3. – unutbu Jun 27 '15 at 09:18
  • Do you know where your `nltk_data` directory is located? Check that `nltk_data/taggers/maxent_treebank_pos_tagger/PY3` exists. – unutbu Jun 27 '15 at 09:48
  • By the way, the latest version of `maxent_treebank_pos_tagger` uses `unicode` data not `iso-8859-1` encoded data. So if you have the `nltk_data/taggers/maxent_treebank_pos_tagger/PY3/english.pickle` file, then you should also revert the change you made to `data.py`. – unutbu Jun 27 '15 at 09:54
  • Yeah, that is the problem. I dont have the latest version of maxent_treebank_pos_tagger. How do i download /PY3 in windows? – Ashfaq Ahmed Jun 27 '15 at 09:55
  • Did you run `nltk.download()`? – unutbu Jun 27 '15 at 09:55
  • Yeah i did, but i dont see the latest version of maxent_treebank_pos_tagger – Ashfaq Ahmed Jun 27 '15 at 09:58
  • That's strange. When I first ran it, the `maxent` line said "out of date". When I clicked the `Download` button, the line changed to "installed", and I found the `PY3/english.pickle` file was installed. – unutbu Jun 27 '15 at 10:02
  • Hey i just saw, i was able to locate the PY3/english.pickle file. But then unable to figure out, why im getting an error :( – Ashfaq Ahmed Jun 27 '15 at 10:48
  • Check that the directory containing `nltk_data` is the first valid directory listed in `nltk.data.path`. See http://stackoverflow.com/q/3522372/190597. If you have more than one `nltk_data`, `nltk` might be finding the wrong directory first. – unutbu Jun 27 '15 at 11:54
  • Also, please update the question with the full traceback error message that you currently see. It must be different than what is currently posted, since that traceback mentions `encoding='iso-8859-1'`, which should now be deleted. – unutbu Jun 27 '15 at 11:55
  • Hey, the issues is sorted out. Thanks for your help. :) It works with encoding='iso-8859-1'. :) – Ashfaq Ahmed Jun 27 '15 at 13:39
  • I'm surprised that is the solution, but glad to hear you got it working. – unutbu Jun 27 '15 at 13:42
  • Actually after adding encoding='iso-8859-1' i didnt restart, that is why i didnt get the output, later i restarted my system and i got the output :) – Ashfaq Ahmed Jun 27 '15 at 13:46
  • I think what this might mean is that you actually do need nltk3.0.3 or newer for the `PY3/english.pickle` to be read. http://stackoverflow.com/a/25621643/190597 hints that this is true. – unutbu Jun 27 '15 at 13:57