0

I have a treebank with syntactic parse tree for each sentence as given below:

(S (NP (DT The) (NN government)) (VP (VBZ charges) (SBAR (IN that) (S (PP (IN between) (NP (NNP July) (CD 1971)) (CC and) (NP (NNP July) (CD 1992))) (, ,) (NP (NNP Rostenkowski)) (VP (VBD placed) (NP (CD 14) (NNS people)) (PP (IN on) (NP (NP (PRP$ his) (JJ congressional) (NN payroll)) (SBAR (WHNP (WP who)) (S (VP (VBD performed) (NP (NP (JJ personal) (NNS services)) (PP (IN for) (NP (NP (PRP him)) (CC and) (NP (PRP$ his) (NN family))))))))))))))

I want to annotate the parse tree with lexical information like headwords for each node in the parse tree.

Can I do that using StanfordCoreNLP? Please guide me in the right direction. I would prefer a solution that can be implemented in JAVA as I am familiar with JAVA.

Thanks a lot!

kss
  • 23
  • 4
  • @David. I do not know how to proceed to be honest. I do know that you can build a parse tree and a dependency tree using StanfordNLP parser when you have a corpus of sentences. But I have no idea how I can achieve the task of annotating an already available treebank with lexical information. Can you give me any direction? – kss Feb 28 '15 at 14:00

2 Answers2

1

You can build this using the TreeTransformer interface. Use a HeadFinder (if you're parsing English, the CollinsHeadFinder) to retrieve the head word / head constituent at each node.

You can see an example of this kind of work in the TreeAnnotator within the parser.

Jon Gauthier
  • 25,202
  • 6
  • 63
  • 69
0

You probably searching for a lemmatization tool. StandfordNLP supports it, see Lemmatization java.

How to include the lemmas in an existing treebank depends basically on what you want to do. What further tools should process this treebank? Especially, what format do they expect? And so on ...

Community
  • 1
  • 1
char bugs
  • 419
  • 2
  • 8