0

I am using nltk.ne_chunk() like this:

sent="Azhar is asking what is weather in Chicago today? "
chunks = nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent)), binary=True)
print(list(chunks))

And getting oitput like this:

[Tree('NE', [('Azhar', 'NNP')]), ('is', 'VBZ'), ('asking', 'VBG'), ('what', 'WP'), ('is', 
'VBZ'), ('weather', 'NN'), ('in', 'IN'), Tree('NE', [('Chicago', 'NNP')]), ('today', 'NN'), 
('?', '.')]

But I am expecting an output like this:

[Tree('PERSON', [('Azhar', 'NNP')]), ('is', 'VBZ'), ('asking', 'VBG'), ('what', 'WP'), ('is', 
'VBZ'), ('weather', 'NN'), ('in', 'IN'), Tree('GPE', [('Chicago', 'NNP')]), ('today', 'NN'), 
('?', '.')]

Can some one tell me what I am doing wrong here?

ChrisGPT was on strike
  • 127,765
  • 105
  • 273
  • 257
azhar
  • 3
  • 1

1 Answers1

0

After installing the Spacy library and download the relevant model (en_core_web_sm) which is explained here, you can simply extract Named-Entities!

import spacy
NER = spacy.load("en_core_web_sm")
sent="Azhar is asking what is weather in Chicago today? "
text1= NER(sent)
for word in text1.ents:
    print(word.text,word.label_)

output:

Azhar PERSON
Chicago GPE
today DATE

UPDATE

nltk.ne_chunk returns a nested nltk.tree.Tree object so you would have to traverse the Tree object to get to the NEs. tree2conlltags from nltk.chunk would do something like that!

from nltk import word_tokenize, pos_tag, ne_chunk
from nltk.chunk import tree2conlltags

sentence = "Azhar is asking what is weather in Chicago today?"
print(tree2conlltags(ne_chunk(pos_tag(word_tokenize(sentence)))))

output in IOB format:

[('Azhar', 'NNP', 'B-GPE'), ('is', 'VBZ', 'O'), ('asking', 'VBG', 'O'), ('what', 'WP', 'O'), ('is', 'VBZ', 'O'), ('weather', 'NN', 'O'), ('in', 'IN', 'O'), ('Chicago', 'NNP', 'B-GPE'), ('today', 'NN', 'O'), ('?', '.', 'O')]

more on this here!

meti
  • 1,921
  • 1
  • 8
  • 15
  • Thanks for your answer I will use spacy for my task. Could you also clarify me why nltk is not returning these labels, I think it should as I have seen articles and questions in stack overflow where it is retuning these tags – azhar Sep 05 '21 at 06:35
  • Thanks for your reply, it is very helpful and I find that spacy is doing better job in labelling the tokens – azhar Sep 05 '21 at 14:35