10

Given the following sentence:

The old oak tree from India fell down.

How can I get the following parse tree representation of the sentence using python NLTK?

(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))

I need a complete example which I couldn't find in web!


Edit

I have gone through this book chapter to learn about parsing using NLTK but the problem is, I need a grammar to parse sentences or phrases which I do not have. I have found this stackoverflow post which also asked about grammar for parsing but there is no convincing answer there.

So, I am looking for a complete answer that can give me the parse tree given a sentence.

Community
  • 1
  • 1
Wasi Ahmad
  • 35,739
  • 32
  • 114
  • 161
  • NLTK has several parsers (http://www.nltk.org/api/nltk.parse.html) IIRC I for preference use the Stanford one. – Frames Catherine White Feb 19 '17 at 02:14
  • can you give me an example code? i am really having tough time in cracking this. – Wasi Ahmad Feb 19 '17 at 02:15
  • I've got to run off, if no one post anything in the next 12 hours I'll come back and post something. It has been a while I'ld have to dig up some of my old code (and translate it from julia to python probably). – Frames Catherine White Feb 19 '17 at 02:29
  • 1
    @WasiAdmad, if you're actually "having a tough time", show your code so far and ask a question about the problem you encounter. – alexis Feb 19 '17 at 08:40

5 Answers5

11

Here is alternative solution using StanfordCoreNLP instead of nltk. There are few library that build on top of StanfordCoreNLP, I personally use pycorenlp to parse the sentence.

First you have to download stanford-corenlp-full folder where you have *.jar file inside. And run the server inside the folder (default port is 9000).

export CLASSPATH="`find . -name '*.jar'`"
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer [port?] # run server

Then in Python, you can run the following in order to tag the sentence.

from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')

text = "The old oak tree from India fell down."

output = nlp.annotate(text, properties={
  'annotators': 'parse',
  'outputFormat': 'json'
})

print(output['sentences'][0]['parse']) # tagged output sentence
titipata
  • 5,321
  • 3
  • 35
  • 59
  • thanks it worked for me. just wondering if any simpler method is available to do this task! – Wasi Ahmad Mar 13 '17 at 07:56
  • Ah, I'm not sure if there is a fast solution in NLTK to parse tree. Using StanfordCoreNLP with pycorenlp wrapper seems like the one good way to this task. Maybe there is a way that you can add StanfordNLP path for NLTK in order to parse the text? I would love to know the alternative solutions also! – titipata Mar 13 '17 at 15:29
  • @titipat starting server gives the following error message: `Invalid maximum heap size: -Xmx4g. the specified size exceeds the maximum representable size. Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit.` – user1700890 Apr 09 '17 at 03:11
  • Hmm, that's weird. Can you check their page to see how to run the server? https://stanfordnlp.github.io/CoreNLP/corenlp-server.html – titipata Apr 09 '17 at 03:17
  • 1
    @titipat I managed to get it to work on Ubuntu. Do you know how to use different parsers included into Stanford core NLP with pycorenlp.py? Or how to split paragraph into sentences? – user1700890 Apr 09 '17 at 15:06
  • 1
    i had a solution to to that actually, you can check it here: http://titipata.github.io/2016/11/09/sentence-split.html (in stanford core nlp section) – titipata Apr 09 '17 at 19:11
2

Older question, but you can use nltk together with the bllipparser. Here is a longer example from nltk. After some fiddling I myself used the following:

To install (with nltk already installed):

sudo python3 -m nltk.downloader bllip_wsj_no_aux
pip3 install bllipparser

To use:

from nltk.data import find
from bllipparser import RerankingParser

model_dir = find('models/bllip_wsj_no_aux').path
parser = RerankingParser.from_unified_model_dir(model_dir)

best = parser.parse("The old oak tree from India fell down.")

print(best.get_reranker_best())
print(best.get_parser_best())

Output:

-80.435259246021 -23.831876011253 (S1 (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down))) (. .)))
-79.703612178593 -24.505514522222 (S1 (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (ADVP (RB down))) (. .)))
vlz
  • 911
  • 1
  • 10
  • 18
1

To get parse tree using nltk library you can use the following code

# Import required libraries
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk import pos_tag, word_tokenize, RegexpParser

# Example text
sample_text = "The quick brown fox jumps over the lazy dog"

# Find all parts of speech in above sentence
tagged = pos_tag(word_tokenize(sample_text))

#Extract all parts of speech from any text
chunker = RegexpParser("""
                    NP: {<DT>?<JJ>*<NN>} #To extract Noun Phrases
                    P: {<IN>}            #To extract Prepositions
                    V: {<V.*>}           #To extract Verbs
                    PP: {<p> <NP>}       #To extract Prepositional Phrases
                    VP: {<V> <NP|PP>*}   #To extract Verb Phrases
                    """)

# Print all parts of speech in above sentence
output = chunker.parse(tagged)
print("After Extracting\n", output)
# output looks something like this
 (S
  (NP The/DT old/JJ oak/NN)
  (NP tree/NN)
  (P from/IN)
  India/NNP
  (VP (V fell/VBD))
  down/RB
  ./.)

You can also get a graph for this tree

# To draw the parse tree
output.draw()

Output graph looks like this enter image description here

Darkstar Dream
  • 1,649
  • 1
  • 12
  • 23
0

An alternative solution to the question of the OP is to use the Constituent-Treelib library, which can be installed via: pip install constituent-treelib

You only need to perform the following steps:

from constituent_treelib import ConstituentTree

# First, we have to provide a sentence that should be parsed
sentence = "The way to get started is to quit talking and begin doing."

# Then, we define the language that should be considered with respect to the underlying models 
language = ConstituentTree.Language.English

# You can also specify the desired model for the language ("Small" is selected by default)
spacy_model_size = ConstituentTree.SpacyModelSize.Medium

# Next, we must create the neccesary NLP pipeline. 
# If you wish, you can instruct the library to download and install the models automatically
nlp = ConstituentTree.create_pipeline(language, spacy_model_size) #, download_models=True

# Now, we can instantiate a ConstituentTree object and pass it the sentence and the NLP pipeline
tree = ConstituentTree(sentence, nlp)

# Finally, we can print the parsed tree
print(tree)

Result...

(S
  (NP
    (NP (DT The) (NN way))
    (SBAR (S (VP (TO to) (VP (VB get) (VP (VBN started)))))))
  (VP
    (VBZ is)
    (S
      (VP
        (TO to)
        (VP
          (VP (VB quit) (NP (VBG talking)))
          (CC and)
          (VP (VB begin) (S (VP (VBG doing))))))))
  (. .))
0

You can also use more advanced “Constituency Parsing with a Self-Attentive Encoder” available in Spacy:

import benepar, spacy
nlp = spacy.load('en_core_web_md')
nlp.add_pipe('benepar', config={'model': 'benepar_en3'})
doc = nlp('The time for action is now. It is never too late to do 
something.')
sent = list(doc.sents)[0]
print(sent._.parse_string)
# (S (NP (NP (DT The) (NN time)) (PP (IN for) (NP (NN action)))) (VP (VBZ 
is) (ADVP (RB now))) (. .))
print(sent._.labels)
# ('S',)
print(list(sent._.children)[0])
# The time for action

More info: Berkeley Neural Parser

Amarpreet Singh
  • 227
  • 2
  • 11