For example I got this small tree (which is obviously a subtree only):
(VP (VBZ says) (SBAR (-NONE- *0*) (S-3 (-NONE- *T*))))
Trace trees are those trees leading to a leaf of the shape: *.*. I now want to remove all subtrees which are a trace tree. So for this example the result should look like this:
(VP (VBZ says))
So far I extracted all those leaves:
from nltk.tree import ParentedTree
import re
traceLeaves = []
line = "( (VP (VBZ says) (SBAR (-NONE- *0*) (S-3 (-NONE- *T*)))))"
currTree = ParentedTree.fromstring(line, remove_empty_top_bracketing = True)
for leaf in currTree.leaves():
if re.search('\*', leaf):
traceLeaves.append(leaf)
but I got no idea how to navigate up the tree until there exists a sibling which is no trace tree and remove the trace tree from the original tree. I'm completely stuck here since I only started working with nltk...
EDIT:
Here is one complete sentence I want to be able to process:
( (SINV (S-3 (S (NP-SBJ-1 (-NONE- *PRO*)) (VP (VBG Assuming) (NP (DT that) (NN post)) (PP-TMP (IN at) (NP (NP (DT the) (NN age)) (PP (IN of) (NP (CD 35))))))) (, ,) (NP-SBJ-1 (PRP he)) (VP (VBD managed) (PP-MNR (IN by) (NP (NN consensus))) (, ,) (SBAR-ADV (IN as) (S (NP-SBJ (-NONE- *PRO*)) (VP (VBZ is) (NP-PRD (DT the) (NN rule)) (PP-LOC (IN in) (NP (NNS universities)))))))) (, ,) (VP (VBZ says) (SBAR (-NONE- *0*) (S-3 (-NONE- *T*)))) (NP-SBJ (NP (NNP Warren) (NNP H.) (NNP Strother)) (, ,) (NP (NP (DT a) (NN university) (NN official)) (SBAR (WHNP-2 (WP who)) (S (NP-SBJ-2 (-NONE- *T*)) (VP (VBZ is) (VP (VBG researching) (NP (NP (DT a) (NN book)) (PP (IN on) (NP (NNP Mr.) (NNP Hahn)))))))))) (. .)) )