How to get a parse NLP Tree object from bracketed parse string with nltk or spacy?

Question

I have a sentence "You could say that they regularly catch a shower , which adds to their exhilaration and joie de vivre." and I can't achieve to get the NLP parse tree like the following example:

(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to) (NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW de) (FW vivre))))))))))))) (. .)))

I want to replicate the solution to this question https://stackoverflow.com/a/39320379 but I have a string sentence instead of the NLP tree.

BTW, I am using python 3

see https://stackoverflow.com/questions/45520228/parse-nltk-chunk-string-to-form-tree — Amarpreet Singh, Mar 01 '23 at 16:44

score 4 · Answer 1 · answered Mar 19 '18 at 22:39

Use the Tree.fromstring() method:

>>> from nltk import Tree
>>> parse = Tree.fromstring('(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to) (NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW de) (FW vivre))))))))))))) (. .)))')

>>> parse
Tree('ROOT', [Tree('S', [Tree('NP', [Tree('PRP', ['You'])]), Tree('VP', [Tree('MD', ['could']), Tree('VP', [Tree('VB', ['say']), Tree('SBAR', [Tree('IN', ['that']), Tree('S', [Tree('NP', [Tree('PRP', ['they'])]), Tree('ADVP', [Tree('RB', ['regularly'])]), Tree('VP', [Tree('VB', ['catch']), Tree('NP', [Tree('NP', [Tree('DT', ['a']), Tree('NN', ['shower'])]), Tree(',', [',']), Tree('SBAR', [Tree('WHNP', [Tree('WDT', ['which'])]), Tree('S', [Tree('VP', [Tree('VBZ', ['adds']), Tree('PP', [Tree('TO', ['to']), Tree('NP', [Tree('NP', [Tree('PRP$', ['their']), Tree('NN', ['exhilaration'])]), Tree('CC', ['and']), Tree('NP', [Tree('FW', ['joie']), Tree('FW', ['de']), Tree('FW', ['vivre'])])])])])])])])])])])])]), Tree('.', ['.'])])])

>>> parse.pretty_print()
                                                       ROOT                                                             
                                                        |                                                                
                                                        S                                                               
  ______________________________________________________|_____________________________________________________________   
 |         VP                                                                                                         | 
 |     ____|___                                                                                                       |  
 |    |        VP                                                                                                     | 
 |    |     ___|____                                                                                                  |  
 |    |    |       SBAR                                                                                               | 
 |    |    |    ____|_______                                                                                          |  
 |    |    |   |            S                                                                                         | 
 |    |    |   |     _______|____________                                                                             |  
 |    |    |   |    |       |            VP                                                                           | 
 |    |    |   |    |       |        ____|______________                                                              |  
 |    |    |   |    |       |       |                   NP                                                            | 
 |    |    |   |    |       |       |         __________|__________                                                   |  
 |    |    |   |    |       |       |        |          |         SBAR                                                | 
 |    |    |   |    |       |       |        |          |      ____|____                                              |  
 |    |    |   |    |       |       |        |          |     |         S                                             | 
 |    |    |   |    |       |       |        |          |     |         |                                             |  
 |    |    |   |    |       |       |        |          |     |         VP                                            | 
 |    |    |   |    |       |       |        |          |     |     ____|____                                         |  
 |    |    |   |    |       |       |        |          |     |    |         PP                                       | 
 |    |    |   |    |       |       |        |          |     |    |     ____|_____________________                   |  
 |    |    |   |    |       |       |        |          |     |    |    |                          NP                 | 
 |    |    |   |    |       |       |        |          |     |    |    |          ________________|________          |  
 NP   |    |   |    NP     ADVP     |        NP         |    WHNP  |    |         NP               |        NP        | 
 |    |    |   |    |       |       |     ___|____      |     |    |    |     ____|_______         |    ____|____     |  
PRP   MD   VB  IN  PRP      RB      VB   DT       NN    ,    WDT  VBZ   TO  PRP$          NN       CC  FW   FW   FW   . 
 |    |    |   |    |       |       |    |        |     |     |    |    |    |            |        |   |    |    |    |  
You could say that they regularly catch  a      shower  ,   which adds  to their     exhilaration and joie  de vivre  .

This actually this does not resolve my issue, I need to know how to obtain the values that you specify as a parameter inside Tree.fromstring() method. I have a lot of string sentences, around of 70k. I can't manually specify the NLP Tree for each one. — xzegga, Mar 20 '18 at 02:36
I'm confused =) You mean you want to get parses from strings? Or do you want to parses the parsed string to an `nltk.Tree` object? — alvas, Mar 20 '18 at 06:17

score 2 · Answer 2 · answered Mar 20 '18 at 04:28

I am going to assume there is a good reason as to why you need the dependency parse tree in that format. Spacy does a great job by using a CNN (Convolutional Neural Network) to produce CFGs (Context-Free Grammars), is production ready, and is super-fast. You can do something like the below to see for yourself (and then read the docs in the prior link):

import spacy

nlp = spacy.load('en')

text = 'You could say that they regularly catch a shower , which adds to their exhilaration and joie de vivre.'

for token in nlp(text):
    print(token.dep_, end='\t')
    print(token.idx, end='\t')
    print(token.text, end='\t')
    print(token.tag_, end='\t')
    print(token.head.text, end='\t')
    print(token.head.tag_, end='\t')
    print(token.head.idx, end='\t')
    print(' '.join([w.text for w in token.subtree]), end='\t')
    print(' '.join([w.text for w in token.children]))

Now, you could make an algorithm to navigate this tree, and print accordingly (I couldn't find a quick example, sorry, but you can see the indexes and how to traverse the parse). Another thing you could do is to extract the CFG somehow, and then use NLTK to do the parsing and subsequent displaying in the format you desire. This is from the NLTK playbook (modified to work with Python 3):

import nltk
from nltk import CFG

grammar = CFG.fromstring("""
  S -> NP VP
  VP -> V NP | V NP PP
  V -> "saw" | "ate"
  NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "dog" | "cat" | "cookie" | "park"
  PP -> P NP
  P -> "in" | "on" | "by" | "with"
  """)

text = 'Mary saw Bob'

sent = text.split()
rd_parser = nltk.RecursiveDescentParser(grammar)
for p in rd_parser.parse(sent):
    print(p)
# (S (NP Mary) (VP (V saw) (NP Bob)))

However, you can see that you need to define the CFG (so if you tried your original text in place of the example's, you saw that it didn't understand the tokens not defined in the CFG).

It seems the easiest way to obtain your desired format is using Stanford's NLP parser. Taken from this SO question (and sorry, I haven't tested it):

parser = StanfordParser(model_path='edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz')
parsed = parser.raw_parse('Jack payed up to 5% more for each unit')
for line in parsed:
    print(line, end=' ') # This will print all in one line, as desired

I didn't test this because I don't have the time to install the Stanford Parser, which can be a bit of a cumbersome process (relative to installing Python modules), that is, assuming you are looking for a Python solution.

I hope this helps, and I'm sorry that it's not a direct answer.

The `StanfordParser` code won't really work with the latest version of NLTK since it's deprecated. I suggest using `nltk.parse.corenlp.CoreNLPParser`. — alvas, Mar 20 '18 at 06:18
See https://stackoverflow.com/questions/13883277/stanford-parser-and-nltk — alvas, Mar 20 '18 at 06:18
Yup, good point. However you get ```StanfordParser``` up and running, it appears that will output the desired format of the parse tree. — Eugene, Mar 22 '18 at 03:49

How to get a parse NLP Tree object from bracketed parse string with nltk or spacy?

2 Answers2