I managed to make a class that creates a tree from spaCy and I would like to keep in the nodes only the words and not the whole thing with the grammar. That is to say have start
from start_VB_ROOT
.
To generalize, for instance with the sentence When did Beyonce start becoming popular? the input is
[Tree('start_VB_ROOT', ['When_WRB_advmod', 'did_VBD_aux', 'Beyonce_NNP_nsubj', Tree('becoming_VBG_xcomp', ['popular_JJ_acomp']), '?_._punct'])]
And the expected output with the function I provided below would be a tree :
<class 'str'> When_WRB_advmod
son creation : When
<class 'str'> did_VBD_aux
son creation : did
<class 'str'> Beyonce_NNP_nsubj
son creation : Beyonce
<class 'nltk.tree.Tree'> (becoming_VBG_xcomp popular_JJ_acomp)
sub tree creation
son: becoming_VBG_xcomp
<class 'str'> popular_JJ_acomp
son creation popular
end of sub tree creation
<class 'str'> ?_._punct
son creation ?
Here is the function
class WordTree:
'''Tree for spaCy dependency parsing array'''
def __init__(self, array, parent = None):
"""
Construct a new 'WordTree' object.
:param array: The array contening the dependency
:param parent: The parent of the array if exists
:return: returns nothing
"""
self.parent = []
self.children = []
self.data = array
for element in array[0]:
print(type(element),element)
# we check if we got a subtree
if type(element) is Tree:
print("sub tree creation")
self.children.append(element.label())
print("son:",element.label())
t = WordTree([element],element.label()) # should I verify if parent is empty ?
print("end of sub tree creation")
# else if we have a string we create a son
elif type(element) is str:
print("son creation",element)
self.children.append(element)
# in other case we have a problem
else:
print("issue?")
break
Which gives the following output at the moment :
<class 'str'> When_WRB_advmod
son creation When_WRB_advmod
<class 'str'> did_VBD_aux
son creation did_VBD_aux
<class 'str'> Beyonce_NNP_nsubj
son creation Beyonce_NNP_nsubj
<class 'nltk.tree.Tree'> (becoming_VBG_xcomp popular_JJ_acomp)
sub tree creation
son: becoming_VBG_xcomp
<class 'str'> popular_JJ_acomp
son creation popular_JJ_acomp
end of sub tree creation
<class 'str'> ?_._punct
son creation ?_._punct