0

I want to use stanford parser using Python, I use Windows 7, I've installed Python 2.7 and nltk 3.0 and I downloaded the stanford parser from the official site.

I got the javahome environment problem which I solved, then I got this error message:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 0: ordinal not in range(128)

and I can't find a solution for this problem.

I used the next code :

# -*- coding: utf-8 -*-

from nltk.parse import stanford

parser = stanford.StanfordParser(model_path='C:\Program Files (x86)\stanford-parser-full-2015-01-30\edu\stanford\nlp\models\lexparser\englishPCFG.ser.gz')

sent = 'my name is zim'
parser.parse(sent)

I've looked in stack overflow for a solution but I didn't find one.

ziMtyth
  • 1,008
  • 16
  • 32
  • simple solution, use `python3` =) – alvas Apr 11 '15 at 20:09
  • 1
    Are you sure your code runs using `\n` in your path? – Padraic Cunningham Apr 11 '15 at 20:42
  • see this: http://stackoverflow.com/questions/28365626/how-to-output-nltk-chunks-to-file/28381060#28381060 – alvas Apr 11 '15 at 21:19
  • Guys i'm so gratful for your help, i did what you suggested @alvas (it was hard to do what the others suggested xd but still thank you guys for your time :D), i downloaded python 3.3.3 and nltk 3.0.2. Now i'm getting this error: "raise OSError('Java command failed : ' + str(cmd)) OSError: Java command failed :...". it seems like it is a Java command failed error. I've no idea what is this error, it is killing me :p. plz help me to make stanford parser work, i REALLY need it for my project. – ziMtyth Apr 12 '15 at 00:12
  • have you installed JRE? – alvas Apr 12 '15 at 06:27
  • Yes, and i'm using jdk1.8.0_20 and jre1.8.0_20, i tried to add there paths to the environment variable (the Path variable), but still doesn't work. Note that i use another variable (JAVAHOME varibale) which contain the path of the jdk and not the jre. My jdk and jre are installed in C:\Program Files\Java. any suggestions @alvas ?. so sorry for wasting your time . – ziMtyth Apr 12 '15 at 16:28
  • A comprehensive answer came in on the same day, but you do not appear to have voted on it, accepted it or replied to it. May I ask why? – halfer Jan 23 '16 at 11:21

3 Answers3

2

If the os.environ or export paths are set properly as described in this: Stanford Parser and NLTK, then it should be an issue of

  • specifying the encoding in the NLTK API AND
  • the encoding of your input string

So the solution would be:

  • update NLTK to the latest stable version i.e. sudo pip install -U nltk
  • use python3!!!! or specify the encoding for your string

If you're somehow unable to update your python or NLTK, then:

It is STRONGLY recommended that you use python3 especially when handling text inputs.

If all else fails, and you only have the old version of NLTK and you must somehow use py2.7, then:

import six
from nltk.parse import stanford

path_to_model = "C:\Program Files (x86)\stanford-parser-full-2015-01-30\edu\stanford\nlp\models\lexparser\englishPCFG.ser.gz"

parser = stanford.StanfordParser(model_path=path_to_model, encoding='utf8')

sent = six.text_type('my name is zim')
parser.parse(sent)

See six docs @ http://pythonhosted.org//six/#six.text_type

Community
  • 1
  • 1
alvas
  • 115,346
  • 109
  • 446
  • 738
1

0xe9 isn't a valid ASCII byte, so your englishPCFG.ser.gz must not be ASCII encoded. You'll need to figure out what encoding it's using (probably UTF-8) and tell StanfordParser() about it with the encoding keyword argument.

Erin Call
  • 1,764
  • 11
  • 15
  • that's only part of the problem, the default encoding for NLTK3.0's stanford API is `ascii` it has been changed to 'utf8' in the latest version, see https://github.com/nltk/nltk/issues/877. The other part is how the OP read the string, using python3 and the latest stable version of NLTK resolves the issue. – alvas Apr 11 '15 at 21:16
0

I've found what was the problem that caused the error that I've encountered

raise OSError('Java command failed : ' + str(cmd)) OSError: Java command failed :...

This error is due to the bad interpretation of the address in the following instruction :

parser = stanford.StanfordParser(model_path='C:\Program Files (x86)\stanford-parser-full-2015-01-30\edu\stanford\nlp\models\lexparser\englishPCFG.ser.gz').

Python or Java interpreted the ...\nlp\.. as \n lp\..., so as a result, it couldn't find the path.

I've tried a simple solution, I've renamed the folder nlp. And it worked!

ziMtyth
  • 1,008
  • 16
  • 32