5

I am trying to make a tree (nested dictionary) from the output of dependency parser. The sentence is "I shot an elephant in my sleep". I am able to get the output as described on the link: How do I do dependency parsing in NLTK?

nsubj(shot-2, I-1)
det(elephant-4, an-3)
dobj(shot-2, elephant-4)
prep(shot-2, in-5)
poss(sleep-7, my-6)
pobj(in-5, sleep-7)

To convert this list of tuples into nested dictionary, I used the following link: How to convert python list of tuples into tree?

def build_tree(list_of_tuples):
    all_nodes = {n[2]:((n[0], n[1]),{}) for n in list_of_tuples}
    root = {}    
    print all_nodes
    for item in list_of_tuples:
        rel, gov,dep = item
        if gov is not 'ROOT':
            all_nodes[gov][1][dep] = all_nodes[dep]
        else:
            root[dep] = all_nodes[dep]
    return root

This gives the output as follows:

{'shot': (('ROOT', 'ROOT'),
  {'I': (('nsubj', 'shot'), {}),
   'elephant': (('dobj', 'shot'), {'an': (('det', 'elephant'), {})}),
   'sleep': (('nmod', 'shot'),
    {'in': (('case', 'sleep'), {}), 'my': (('nmod:poss', 'sleep'), {})})})}

To find the root to leaf path, I used the following link: Return root to specific leaf from a nested dictionary tree

[Making the tree and finding the path are two separate things]The second objective is to find the root to leaf node path like done Return root to specific leaf from a nested dictionary tree. But I want to get the root-to-leaf (dependency relationship path) So, for instance, when I will call recurse_category(categories, 'an') where categories is the nested tree structure and 'an' is the word in the tree, I should get ROOT-nsubj-dobj (dependency relationship till root) as output.

VIVEK
  • 107
  • 1
  • 9
  • Hint: `DependencyGraph` https://github.com/nltk/nltk/blob/develop/nltk/parse/dependencygraph.py#L36 – alvas Sep 03 '18 at 16:14
  • @alvas It will be grt if you can show how to implement my case since I am getting lost. If you want me to change the way I am converting to tuples to dictionary then please show that rather than giving a github link – VIVEK Sep 04 '18 at 01:53
  • What is the desired output you're looking for? – alvas Sep 04 '18 at 07:32
  • @alvas I am looking to find root to leaf path. As explained in the question (given the link as well) if I pass ‘an’ then I should get ‘Root-nubj-dobj – VIVEK Sep 04 '18 at 16:10
  • I don't understand why input is `an` and output expected is ` root-nubj-dobj` could you eleaborate? – alvas Sep 04 '18 at 17:31
  • @alvas I am trying to find the path to specific root. Now in this example, what I am trying to say is if we do recurse_category(our_tree, "an") [https://stackoverflow.com/questions/47302382/return-root-to-specific-leaf-from-a-nested-dictionary-tree] I should get `root-nsubj-dobj` which is the relationship of the word to the root in the sentence. – VIVEK Sep 04 '18 at 20:00
  • It's still unclear what you're trying to achieve, can you just add the full input sentence and the desired output to the question? It might be easier to explain than putting it in a short comment. – alvas Sep 04 '18 at 22:12

2 Answers2

3

Firstly, if you're just using the pre-trained model for the Stanford CoreNLP dependency parser, you should use the CoreNLPDependencyParser from nltk.parse.corenlp and avoid using the old nltk.parse.stanford interface.

See Stanford Parser and NLTK

After downloading and running the Java server in terminal, in Python:

>>> from nltk.parse.corenlp import CoreNLPDependencyParser
>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> sent = "I shot an elephant with a banana .".split()
>>> parses = list(dep_parser.parse(sent))
>>> type(parses[0])
<class 'nltk.parse.dependencygraph.DependencyGraph'>

Now we see that the parses are of type DependencyGraph from nltk.parse.dependencygraph https://github.com/nltk/nltk/blob/develop/nltk/parse/dependencygraph.py#L36

To convert the DependencyGraph to a nltk.tree.Tree object by simply doing DependencyGraph.tree():

>>> parses[0].tree()
Tree('shot', ['I', Tree('elephant', ['an']), Tree('banana', ['with', 'a']), '.'])

>>> parses[0].tree().pretty_print()
          shot                  
  _________|____________         
 |   |  elephant      banana    
 |   |     |       _____|_____   
 I   .     an    with         a 

To convert it into the bracketed parse format:

>>> print(parses[0].tree())
(shot I (elephant an) (banana with a) .)

If you're looking for dependency triplets:

>>> [(governor, dep, dependent) for governor, dep, dependent in parses[0].triples()]
[(('shot', 'VBD'), 'nsubj', ('I', 'PRP')), (('shot', 'VBD'), 'dobj', ('elephant', 'NN')), (('elephant', 'NN'), 'det', ('an', 'DT')), (('shot', 'VBD'), 'nmod', ('banana', 'NN')), (('banana', 'NN'), 'case', ('with', 'IN')), (('banana', 'NN'), 'det', ('a', 'DT')), (('shot', 'VBD'), 'punct', ('.', '.'))]

>>> for governor, dep, dependent in parses[0].triples():
...     print(governor, dep, dependent)
... 
('shot', 'VBD') nsubj ('I', 'PRP')
('shot', 'VBD') dobj ('elephant', 'NN')
('elephant', 'NN') det ('an', 'DT')
('shot', 'VBD') nmod ('banana', 'NN')
('banana', 'NN') case ('with', 'IN')
('banana', 'NN') det ('a', 'DT')
('shot', 'VBD') punct ('.', '.')

In CONLL format:

>>> print(parses[0].to_conll(style=10))
1   I   I   PRP PRP _   2   nsubj   _   _
2   shot    shoot   VBD VBD _   0   ROOT    _   _
3   an  a   DT  DT  _   4   det _   _
4   elephant    elephant    NN  NN  _   2   dobj    _   _
5   with    with    IN  IN  _   7   case    _   _
6   a   a   DT  DT  _   7   det _   _
7   banana  banana  NN  NN  _   2   nmod    _   _
8   .   .   .   .   _   2   punct   _   _
alvas
  • 115,346
  • 109
  • 446
  • 738
  • So, `nltk.parse.corenlp` doesn't work for some reason. It says `No module named corenlp` but `nltk.parse.stanford` works for me. I have unzipped both stanford-corenlp-full-2018-02-27 and stanford-parser-full-2018-02-27. I have models.jar and parser.jar file as mentioned in the link . Also I tried `from nltk.parse import CoreNLPParser` which didn;t work as well. Also, I am not able to find englishPCFG file but I have the lexparser shell script file. I downloaded the PCFG file from github. It said `NLTK was unable to find the JAVA file Set the JAVAHOME environment variables` – VIVEK Sep 04 '18 at 20:05
  • Upgrade your NLTK `pip3 install -U nltk`. Also don't use the link to jar files in the python code, just start the server. See https://stackoverflow.com/questions/13883277/stanford-parser-and-nltk/51981566#51981566 – alvas Sep 04 '18 at 22:11
  • Thanks it worked! When we are doing `parses[0].tree` we are losing the dependency relationship between the words. I am trying to make a tree which will have the dependency relationship as well, as I want DEPENDENCY RELATIONSHIP path. For example, https://stackoverflow.com/questions/34395127/stanford-nlp-parse-tree-format the tree has POS tag I think, in our case it has to be dependency relationship and then we will find the path. – VIVEK Sep 06 '18 at 01:08
  • Show the desired output in the question. Because dependency labels can't be represented in a hierarchical path. Sometimes the dependencies needs to be cyclic, sometimes it needs to cross sub-branches, it cannot be simply converted into a tree without losing information. It's still unclear what you're trying to achieve and I don't think it's possible keeping the graph structure would be more beneficial, unless all you need is some visualization. – alvas Sep 06 '18 at 01:13
  • Firstly thank you very much for showing interest. I have already posted the desired output. All I need is dependency relationship path which I have already shown in question Maybe you are correct that the dependency relationship can't be represented in tree structure but I don't even need that. I just need the root-to-leaf dependency relation path. – VIVEK Sep 06 '18 at 02:00
  • Maybe to achieve this I might need to firstly create my own data structure (which I tried, please see build_tree()) and then pass the output tree structure AND the word to another function (see https://stackoverflow.com/questions/47302382/return-root-to-specific-leaf-from-a-nested-dictionary-tree) which would return the root-to-leaf dependency relationship path. Hopefully now there is no confusion. I do not need any tree structure as output. All I need is root-to-leaf dependency relationship path – VIVEK Sep 06 '18 at 02:01
  • Correction in the above comment "then pass the nested dictionary AND the word to another function" – VIVEK Sep 06 '18 at 02:12
  • Are you looking for output of only for a specific node or the whole sentence. Post for the whole sentence because I think if you work through manually, you'll see that it may/may not be possible to get the desired output for the whole sentence. Crossing branches seems to be a problem. E.g. when a node has multiple edges, what do you do? – alvas Sep 06 '18 at 03:10
  • E.g. What is the output you expect for the word `shot` in the sentence? I.e. `root-to-leaf` might not be possible to represent in a flat structure. – alvas Sep 06 '18 at 03:10
  • I only want the output for the specific node (the user will select it). I do understand the concern about the node with multiple edges as shown https://linguistics.stackexchange.com/questions/3640/non-projective-example In the case for non-projective parse tree or nodes with multiple edges: it will be great if I can get all paths if not possible, that is fine as well. But I definitely need path with single parent. Like in `I saw the man who loves you`, I should get `ROOT-nsubj` for `I` Also for `shot` the output should be `ROOT`. Hope it is clear explaination. – VIVEK Sep 06 '18 at 21:33
  • @alvas is it possible to export the tree to PNG/SVG format? – Abu Shoeb Jan 23 '21 at 05:53
0

This converts the output to the nested dictionary form. I will keep you updated if I can find the path as well. Maybe this, is helpful.

list_of_tuples = [('ROOT','ROOT', 'shot'),('nsubj','shot', 'I'),('det','elephant', 'an'),('dobj','shot', 'elephant'),('case','sleep', 'in'),('nmod:poss','sleep', 'my'),('nmod','shot', 'sleep')]

nodes={}

for i in list_of_tuples:
    rel,parent,child=i
    nodes[child]={'Name':child,'Relationship':rel}

forest=[]

for i in list_of_tuples:
    rel,parent,child=i
    node=nodes[child]

    if parent=='ROOT':# this should be the Root Node
            forest.append(node)
    else:
        parent=nodes[parent]
        if not 'children' in parent:
            parent['children']=[]
        children=parent['children']
        children.append(node)

print forest

The output is a nested dictionary,

[{'Name': 'shot', 'Relationship': 'ROOT', 'children': [{'Name': 'I', 'Relationship': 'nsubj'}, {'Name': 'elephant', 'Relationship': 'dobj', 'children': [{'Name': 'an', 'Relationship': 'det'}]}, {'Name': 'sleep', 'Relationship': 'nmod', 'children': [{'Name': 'in', 'Relationship': 'case'}, {'Name': 'my', 'Relationship': 'nmod:poss'}]}]}]

The following function can help you to find the root-to-leaf path:

def recurse_category(categories,to_find):
    for category in categories: 
        if category['Name'] == to_find:
            return True, [category['Relationship']]
        if 'children' in category:
            found, path = recurse_category(category['children'], to_find)
            if found:
                return True, [category['Relationship']] + path
    return False, []
amy
  • 342
  • 1
  • 5
  • 18