1

I am new to NLTK

This is the code I have used,

text="The pizza was 66 and brilliant"
pattern = r"""
P: {<NN>+<VBD>+<CD>+}
"""
for sent in sent_tokenize(text):
  sentence = sent.split()
  PChunker = RegexpParser(pattern)
  output= PChunker.parse(pos_tag(sentence))
  print(output)

I am getting the output,

(S The/DT (P pizza/NN was/VBD 66/CD) and/CC brilliant/VB)

I need the output ,

pizza was 66

How can I get this?

1 Answers1

0

The output of RegexpParser.parse is a tree that you can loop through using tree.subtrees. Try the following, to immediately filter for the non-terminal node you are interested in (P in your case):

from nltk import sent_tokenize
from nltk import RegexpParser
from nltk import pos_tag

text="The pizza was 66 and brilliant"
pattern = r"""
P: {<NN>+<VBD>+<CD>+}
"""
for sent in sent_tokenize(text):
  sentence = sent.split()
  PChunker = RegexpParser(pattern)
  output= PChunker.parse(pos_tag(sentence))
  print(output)
  for subtree in output.subtrees(filter=lambda t: t.label() == 'P'):
      print(subtree)
      print(' '.join([x[0] for x in subtree]))
Igor
  • 1,251
  • 10
  • 21