generally A head of a nounphrase is a noun which is rightmost of the NP as shown below tree is the head of the parent NP. So
ROOT | S ___|________________________ NP | ___|_____________ | | PP VP | ____|____ ____|___ NP | NP | PRT ___|_______ | | | | DT JJ NN NN IN NNP VBD RP | | | | | | | | The old oak tree from India fell down
Out[40]: Tree('S', [Tree('NP', [Tree('NP', [Tree('DT', ['The']), Tree('JJ', ['old']), Tree('NN', ['oak']), Tree('NN', ['tree'])]), Tree('PP', [Tree('IN', ['from']), Tree('NP', [Tree('NNP', ['India'])])])]), Tree('VP', [Tree('VBD', ['fell']), Tree('PRT', [Tree('RP', ['down'])])])])
The following code based on a java implementation uses a simplistic rule to find the head of the NP , but i need to be based on the rules:
parsestr='(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))'
def traverse(t):
try:
t.label()
except AttributeError:
return
else:
if t.label()=='NP':
print 'NP:'+str(t.leaves())
print 'NPhead:'+str(t.leaves()[-1])
for child in t:
traverse(child)
else:
for child in t:
traverse(child)
tree=Tree.fromstring(parsestr)
traverse(tree)
The above code gives output:
NP:['The', 'old', 'oak', 'tree', 'from', 'India'] NPhead:India NP:['The', 'old', 'oak', 'tree'] NPhead:tree NP:['India'] NPhead:India
Although now its giving correct output for the sentence given but I need to incorporate a condition that only right most noun is extracted as head , currently it does not check if it were a noun (NN)
print 'NPhead:'+str(t.leaves()[-1])
So something like following in the np head condition in above code:
t.leaves().getrightmostnoun()
Michael Collins dissertation (Appendix A) includes head-finding rules for the Penn Treebank, and hence it is not necessary that only the rightmost noun is the head. Hence the above conditions should incorporate such scenario.
For the following example as given in one of the answers:
(NP (NP the person) that gave (NP the talk)) went home
The head noun of the subject is person but the last leave node of the NP the person that gave the talk is talk.