I am using python3 and nltk with stanford dependency parser to parse a list of sentences. Then with the sentence I collect all nodes information. The following is my code, and it is executed in python3 and an virtualenv environment called .python:
from nltk.parse.stanford import StanfordDependencyParser
parser = StanfordDependencyParser('stanford-parser-full-2015-12-09/stanford-parser.jar', 'stanford-parser-full-2015-12-09/stanford-parser-3.6.0-models.jar');
graph_nodes = sum([[dep_graph.nodes for dep_graph in dep_graphs] for dep_graphs in parser.raw_parse_sents(sentences)], []);
I find out the stanford dependency parser keeps throwing out assertion errors at some sentences. Here is the errors I get:
graph_nodes = sum([[dep_graph.nodes for dep_graph in dep_graphs] for dep_graphs in self.parser.raw_parse_sents(sentences)], []);
File "/Users/user/sent_code/.python/lib/python3.5/site-packages/nltk/parse/stanford.py", line 150, in raw_parse_sents
return self._parse_trees_output(self._execute(cmd, '\n'.join(sentences), verbose))
File "/Users/user/sent_code/.python/lib/python3.5/site-packages/nltk/parse/stanford.py", line 91, in _parse_trees_output
res.append(iter([self._make_tree('\n'.join(cur_lines))]))
File "/Users/user/sent_code/.python/lib/python3.5/site-packages/nltk/parse/stanford.py", line 339, in _make_tree
return DependencyGraph(result, top_relation_label='root')
File "/Users/user/sent_code/.python/lib/python3.5/site-packages/nltk/parse/dependencygraph.py", line 84, in __init__
top_relation_label=top_relation_label,
File "/Users/user/sent_code/.python/lib/python3.5/site-packages/nltk/parse/dependencygraph.py", line 328, in _parse
assert cell_number == len(cells)
AssertionError
Then I found out the sentence that caused this error. It is :
'for all of its insights into the dream world of teen life , and its electronic expression through cyber culture , the film gives no quarter to anyone seeking to pull a cohesive story out of its 2 1/2-hour running time . \n'
I changed the sentence several times to see what triggered the assertion error. It seems that when I remove "/" from it, the sentence could be parsed. When I include "/" in it, the assertion error is thrown.
I wonder if there are special symbols that cause the problem. I go back to nltk's source code to check what causes this assertion error (search"assert" in the website: http://www.nltk.org/_modules/nltk/parse/dependencygraph.html) but couldn't figure out what caused the errors.
Could anyone explain why the error is thrown and how to fix that?