I am currently working with spacy and nltk trees. I am new to these stuff so I am really struggling on them. Here is the code to print a spacy parsed dependency sentence and form it to an nltk tree.
import spacy
from nltk import Tree
en_nlp = spacy.load('en_core_web_sm')
self.sentence = "My bestfriend is John."
sen = en_nlp(self.sentence)
self.print_pos(sen)
self.print_tree(sen)
sentences = list(sen.sents)
sentence = sentences[0]
# we assume the input is only one sentence
root_node = sentence.root
tree = self.to_nltk_tree(root_node)
# self.traverse(tree)
#helper function
#https://stackoverflow.com/questions/36610179/how-to-get-the-dependency-tree-with-spacy
def to_nltk_tree(self, node):
if node.n_lefts + node.n_rights > 0:
return Tree(node.orth_, [self.to_nltk_tree(child) for child in node.children])
else:
return node.orth_
def print_tree(self,sen):
# print the tree in preety print
def to_nltk_tree(node):
if node.n_lefts + node.n_rights > 0:
return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
else:
return node.orth_
[to_nltk_tree(sent.root).pretty_print() for sent in sen.sents]
def print_pos(self,sen):
# print all atttributes in tabular format
for token in sen:
print(f"{token.text:{8}} {token.dep_ + ' =>':{10}} {token.head.text:{9}} {spacy.explain(token.dep_)} ")
Here are the outputs
My poss => bestfriend possession modifier
bestfriend nsubj => is nominal subject
is ROOT => is None
John attr => is attribute
. punct => is punctuation
is
____|______
| | bestfriend
| | |
John . My
I want to find all paths from root to all leaf nodes. For example, the desired output for the above will be ["John is", "My bestfriend is"]. How would I do that? Also, if I want to filter out some leaves like "." punct, what should I do?