Why Stanford parser with nltk is not correctly parsing a sentence?

Question

I am using Stanford parser with nltk in python and got help from Stanford Parser and NLTK to set up Stanford nlp libraries.

from nltk.parse.stanford import StanfordParser
from nltk.parse.stanford import StanfordDependencyParser
parser     = StanfordParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
dep_parser = StanfordDependencyParser(model_path="edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz")
one = ("John sees Bill")
parsed_Sentence = parser.raw_parse(one)
# GUI
for line in parsed_Sentence:
       print line
       line.draw()

parsed_Sentence = [parse.tree() for parse in dep_parser.raw_parse(one)]
print parsed_Sentence

# GUI
for line in parsed_Sentence:
        print line
        line.draw()

I am getting wrong parse and dependency trees as shown in the example below, it is treating 'sees' as noun instead of verb.

What should I do? It work perfectly right when I change sentence e.g.(one = 'John see Bill'). The correct ouput for this sentence can be viewed from here correct ouput of parse tree

Example of correct output is also shown below:

Please post the full code snippet so that other understands where `dep_parser` comes from =) — alvas, Jan 24 '16 at 03:23

score 7 · Accepted Answer · edited May 23 '17 at 12:25

7

Once again, no model is perfect (see Python NLTK pos_tag not returning the correct part-of-speech tag) ;P

You can try a "more accurate" parser, using the NeuralDependencyParser.

First setup the parser properly with the correct environment variables (see Stanford Parser and NLTK and https://gist.github.com/alvations/e1df0ba227e542955a8a), then:

>>> from nltk.internals import find_jars_within_path
>>> from nltk.parse.stanford import StanfordNeuralDependencyParser
>>> parser = StanfordNeuralDependencyParser(model_path="edu/stanford/nlp/models/parser/nndep/english_UD.gz")
>>> stanford_dir = parser._classpath[0].rpartition('/')[0]
>>> slf4j_jar = stanford_dir + '/slf4j-api.jar'
>>> parser._classpath = list(parser._classpath) + [slf4j_jar]
>>> parser.java_options = '-mx5000m'
>>> sent = "John sees Bill"
>>> [parse.tree() for parse in parser.raw_parse(sent)]
[Tree('sees', ['John', 'Bill'])]

Do note that the NeuralDependencyParser only produces the dependency trees:

edited May 23 '17 at 12:25

Community

1
1

answered Jan 24 '16 at 03:18

alvas

115,346
109
446
738

I am using model "englishPCFG.ser.gz" and you are using model "english_UD.gz" . But, how can we choose these models, so that we get able to pick the right one? – Nomiluks Jan 24 '16 at 12:20
There's no perfect model, there's no right/wrong one too, just the one that fits best to your data. So I would say, try all of them and then evaluate them base on what the ultimate purpose of the parses are for. – alvas Jan 24 '16 at 14:18
1

Please follow the instructions till the end on https://gist.github.com/alvations/e1df0ba227e542955a8a – alvas Jan 24 '16 at 15:01
i am trying your code and it is taking very long to execute,... does it take time in execution??? – Nomiluks Jan 24 '16 at 18:36
Follow the last snippet on https://gist.github.com/alvations/e1df0ba227e542955a8a that says "Since the CoreNLP can take a while to load all the models before parsing, it's best to use `raw_parse_sents` instead of `raw_parse` when parsing more than one sentence" ;P – alvas Jan 24 '16 at 22:16

Why Stanford parser with nltk is not correctly parsing a sentence?

1 Answers1