2

I'm trying to extract prepositional phrases from sentences using NLTK. Is there a way for me to do this automatically (e.g. feed a function a sentence and get back its prepositional phrases)?

The examples here seem to require that you start with a grammar before you can get a parse tree. Can I automatically get the grammar and use that to get the parse tree?

Obviously I could tag a sentence, pick out prepositions and the subsequent noun, but this is complicated when the prepositional complement is compound.

Tim Hopper
  • 945
  • 2
  • 13
  • 29
  • Maybe this post will help http://stackoverflow.com/questions/6115677/english-grammar-for-parsing-in-nltk – NLPer Jul 25 '13 at 18:13

2 Answers2

2

What you really want to do is to fully parse your sentence with a robust statistical parser (e.g. like Stanford), and then look for constituents marked with PP:

(ROOT
  (S
    (NP (NNP John))
    (VP (VBZ lives)
      (PP (IN in)
        (NP (DT a) (NN house)))
      (PP (IN by)
        (NP (DT the) (NN sea))))))

I am not sure about the parsing abilities of NLTK and how accurate is the parsing if this feature exists, but it's not much of a problem to call an external parser from Python and then process the output. Using a parser will save you much time and effort (since the parser takes care of everything), and is the only reliable way to do this job.

kitaird
  • 23
  • 1
  • 8
dkar
  • 2,113
  • 19
  • 29
  • Obviously a full parse is an overkill, but it would get to the end goal. I'll give it a shot. Looks like there is [at least one](http://projects.csail.mit.edu/spatial/Stanford_Parser) Python interface to the Stanford parser. – Tim Hopper Jul 26 '13 at 12:34
  • 1
    I wouldn't say an overkill but a necessary complication. If you try to build a rule-based PP-recognizer, you will end up spending a lot of time and effort for mediocre results. – dkar Jul 26 '13 at 13:28
1

I know the answer was already accepted, but a shallow parser will return the NLP chunks with minimal syntactic structure. This fairly linear result may be easier to work with. Here's an online demo of the CLiPS parser: http://www.clips.ua.ac.be/cgi-bin/webdemo/MBSP-instant-webdemo.cgi

Here's an example:

John gave the book to Mary

CliPS shallow parser result

The [PNP] is easy to extract.

Victor Stoddard
  • 3,582
  • 2
  • 27
  • 27
  • 1
    I tested this against multiple types of datasets seems to perform better in the extraction of NPs and PNPs - especially for biomedical text. – kaulmonish Dec 11 '17 at 07:07