I am trying to chunk a sentence using NLTK's POS tags as regular expressions. 2 rules are defined to identify phrases, based on the tags of words in the sentence.
Mainly, I wanted to capture the chunk of one or more verbs followed by an optional determiner and then one or more nouns at the end. This is the first rule in definition. But it is not getting captured as Phrase Chunk.
import nltk
## Defining the POS tagger
tagger = nltk.data.load(nltk.tag._POS_TAGGER)
## A Single sentence - input text value
textv="This has allowed the device to start, and I then see glitches which is not nice."
tagged_text = tagger.tag(textv.split())
## Defining Grammar rules for Phrases
actphgrammar = r"""
Ph: {<VB*>+<DT>?<NN*>+} # verbal phrase - one or more verbs followed by optional determiner, and one or more nouns at the end
{<RB*><VB*|JJ*|NN*\$>} # Adverbial phrase - Adverb followed by adjective / Noun or Verb
"""
### Parsing the defined grammar for phrases
actp = nltk.RegexpParser(actphgrammar)
actphrases = actp.parse(tagged_text)
The input to the chunker, tagged_text is as below.
tagged_text Out[7]: [('This', 'DT'), ('has', 'VBZ'), ('allowed', 'VBN'), ('the', 'DT'), ('device', 'NN'), ('to', 'TO'), ('start,', 'NNP'), ('and', 'CC'), ('I', 'PRP'), ('then', 'RB'), ('see', 'VB'), ('glitches', 'NNS'), ('which', 'WDT'), ('is', 'VBZ'), ('not', 'RB'), ('nice.', 'NNP')]
In the final output, only the adverbial phrase ('then see'), that is matching the second rule is being captured. I expected the verbal phrase ('allowed the device') to match with the first rule and get captured as well, but its not.
actphrases Out[8]: Tree('S', [('This', 'DT'), ('has', 'VBZ'), ('allowed', 'VBN'), ('the', 'DT'), ('device', 'NN'), ('to', 'TO'), ('start,', 'NNP'), ('and', 'CC'), ('I', 'PRP'), Tree('Ph', [('then', 'RB'), ('see', 'VB')]), ('glitches', 'NNS'), ('which', 'WDT'), ('is', 'VBZ'), ('not', 'RB'), ('nice.', 'NNP')])
NLTK version used is 2.0.5 (Python 2.7) Any help or suggestion would be greatly appreciated.
Thanks in advance,
Bala.