1

I am currently working with spacy and nltk trees. I am new to these stuff so I am really struggling on them. Here is the code to print a spacy parsed dependency sentence and form it to an nltk tree.

import spacy
from nltk import Tree
en_nlp = spacy.load('en_core_web_sm')
self.sentence = "My bestfriend is John."
sen = en_nlp(self.sentence)
self.print_pos(sen)
self.print_tree(sen)
sentences = list(sen.sents)
sentence = sentences[0]
# we assume the input is only one sentence
root_node = sentence.root
tree = self.to_nltk_tree(root_node)
#     self.traverse(tree)

#helper function
#https://stackoverflow.com/questions/36610179/how-to-get-the-dependency-tree-with-spacy
def to_nltk_tree(self, node):
    if node.n_lefts + node.n_rights > 0:
       return Tree(node.orth_, [self.to_nltk_tree(child) for child in node.children])
    else:
       return node.orth_


def print_tree(self,sen):
    # print the tree in preety print
     def to_nltk_tree(node):
        if node.n_lefts + node.n_rights > 0:
           return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
        else:
           return node.orth_

    [to_nltk_tree(sent.root).pretty_print() for sent in sen.sents]

def print_pos(self,sen):
    # print all atttributes in tabular format
    for token in sen:
       print(f"{token.text:{8}} {token.dep_ + ' =>':{10}}   {token.head.text:{9}}  {spacy.explain(token.dep_)} ")

Here are the outputs

My       poss =>      bestfriend  possession modifier 
bestfriend nsubj =>     is         nominal subject 
is       ROOT =>      is         None 
John     attr =>      is         attribute 
.        punct =>     is         punctuation 
      is           
  ____|______       
 |    |  bestfriend
 |    |      |      
John  .      My 

I want to find all paths from root to all leaf nodes. For example, the desired output for the above will be ["John is", "My bestfriend is"]. How would I do that? Also, if I want to filter out some leaves like "." punct, what should I do?

0 Answers0