I'm working on a nlp project and I want to filter out words depending on its position in the dependency tree.
To plot the tree I'm using the code from this post:
def to_nltk_tree(node):
if node.n_lefts + node.n_rights > 0:
return Tree(node.orth_, [to_nltk_tree(child) for child in node.children])
else:
return node.orth_
For a sample sentence:
"A group of people around the world are suddenly linked mentally"
I got this tree:
From this tree what I want to get is a list of tuples with the word and its corresponding depth in the tree:
[(linked,1),(are,2),(suddenly,2),(mentally,2),(group,2),(A,3),(of,3),(people,4)....]
For this case, I'm not interested in words which does not have childs: [are,suddenly,mentally,A,the] So what I have been able to do so far is to get only the list of words which have children, to do it so I'm using this code:
def get_words(root,words):
children = list(root.children)
for child in children:
if list(child.children):
words.append(child)
get_words(child,words)
return list(set(words)
[to_nltk_tree(sent.root).pretty_print() for sent in doc.sents]
s_root = list(doc.sents)[0].root
words = []
words.append(s_root)
words = get_words(s_root,words)
words
[around, linked, world, of, people, group]
From this how can I get the desired tuples with the words and its respective depth?