How to get sentence after chunking in NLTK?

Question

I have a sentence as follow:

txt =  "i am living in the West Bengal and my brother live in New York. My name is John Smith"

What I need is:

Get the Chunks With GPE/location as labels and combine these chunks using "_"
Get the Chunks With PERSON label and remove those chunks.

Output I needed:

preprocessed_txt =  "i am living in the West_Bengal and my brother live in New_York. My name is "

I use code from NLTK Named Entity recognition to a Python list to get the labels of the chunks.

import nltk
for sent in nltk.sent_tokenize(sentence):
   for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
      if hasattr(chunk, 'label'):
         print(chunk.label(), '_'.join(c[0] for c in chunk))

This returned me the output as:

LOCATION West_Bengal
GPE New_York
PERSON John_Smith

What to do next?

this gives the output as: LOCATION West_Bengal GPE New_York PERSON John_Smith — Sahil Kamboj, Feb 24 '21 at 08:24
You'd have to re-code, catching the tokens in a list then extract the names and replace them with original names in the list of tokens — theProcrastinator, Feb 24 '21 at 08:32
@YashvanderBamel... How to do this? and this is what my question is all about. — Sahil Kamboj, Feb 24 '21 at 08:38

score 1 · Accepted Answer · answered Feb 24 '21 at 10:09

This should be all you need:

new = list()
for chunk in nltk.ne_chunk(nltk.pos_tag(tokens)):
  try:
    if chunk.label().lower() == 'person':
      continue
    else:
      new.append('_'.join(c[0] for c in chunk))

  except AttributeError:
    new.append(chunk[0])

print(' '.join(new))

How to get sentence after chunking in NLTK?

1 Answers1