Named Entity Recognition for NLTK in Python. Identifying the NE

Question

I need to classify words into their parts of speech. Like a verb, a noun, an adverb etc.. I used the

nltk.word_tokenize() #to identify word in a sentence 
nltk.pos_tag()       #to identify the parts of speech
nltk.ne_chunk()      #to identify Named entities.

The out put of this is a tree. Eg

>>> sentence = "I am Jhon from America"
>>> sent1 = nltk.word_tokenize(sentence )
>>> sent2 = nltk.pos_tag(sent1)
>>> sent3 =  nltk.ne_chunk(sent2, binary=True)
>>> sent3
Tree('S', [('I', 'PRP'), ('am', 'VBP'), Tree('NE', [('Jhon', 'NNP')]), ('from', 'IN'), Tree('NE', [('America', 'NNP')])])

When accessing the element in this tree, i did it as follows:

>>> sent3[0]
('I', 'PRP')
>>> sent3[0][0]
'I'
>>> sent3[0][1]
'PRP'

But when accessing a Named Entity:

>>> sent3[2]
Tree('NE', [('Jhon', 'NNP')])
>>> sent3[2][0]
('Jhon', 'NNP')
>>> sent3[2][1]    
Traceback (most recent call last):
  File "<pyshell#121>", line 1, in <module>
    sent3[2][1]
  File "C:\Python26\lib\site-packages\nltk\tree.py", line 139, in __getitem__
    return list.__getitem__(self, index)
IndexError: list index out of range

I got the above error.

What i want is to get the output as 'NE' similar to the previous 'PRP' so i cant identify which word is a Named Entity. Is there any way of doing this with NLTK in python?? If so please post the command. Or is there a function in the tree library to do this? I need the node value 'NE'

bdk · Accepted Answer · 2011-04-19T13:05:20.073

14

This answer may be off base, and in which case I'll delete it, as I don't have NLTK installed here to try it, but I think you can just do:

   >>> sent3[2].node
   'NE'

sent3[2][0] returns the first child of the tree, not the node itself

Edit: I tried this when I got home, and it does indeed work.

edited Apr 19 '11 at 13:05

answered Apr 18 '11 at 20:58

bdk

4,769
29
33

Before looking at the node attribute, you'll want to check if isinstance(sent3[2], Tree) (after doing from nltk.tree import Tree). – Jacob Apr 19 '11 at 16:00
@Jacob Thanks mate, Really helpful. The next problem i faced was on how to know if an element is a tree or not. As i needed to iterate through the elements using a for loop. The **if isinstance(sent3[2], Tree)** is what i have been looking for all this while. Thanks again. – Asl506 Apr 20 '11 at 15:22
6

in current version (3.1) `node` is replaced by `label()` – Vladimir Jan 21 '16 at 14:36

score 4 · Answer 2 · answered Feb 15 '13 at 05:11

4

Below is my code:

chunks = ne_chunk(postags, binary=True)
for c in chunks:
  if hasattr(c, 'node'):
    myNE.append(' '.join(i[0] for i in c.leaves()))

answered Feb 15 '13 at 05:11

Raullen Chai

315
2
2

score 2 · Answer 3 · answered Aug 28 '17 at 19:11

2

This will work

for sent in chunked_sentences:
  for chunk in sent:
    if hasattr(chunk, "label"):
        print(chunk.label())

answered Aug 28 '17 at 19:11

sai harish

101
1
3

Pritpal Singh · Answer 4 · 2013-10-11T09:31:12.280

1

I agree with bdk

sent3[2].node

O/P - 'NE'

I think there is no function in nltk to do it.Above solution will work but for reference you can check here

for looping problem you can do :-

 for i in range(len(sent3)):
     if "NE" in str(sent3[i]):
          print sent3[i].node

I have executed this in nltk and it works fine..

edited Oct 11 '13 at 09:31

answered Oct 09 '13 at 11:18

Pritpal Singh

61
1
9

sanju · Answer 5 · 2017-04-12T10:46:34.190

1

Now sent3[2].node is outdated.

use sent3[2].label() instead

edited Apr 12 '17 at 10:46

answered Apr 11 '17 at 17:34

sanju

11
2

score 0 · Answer 6 · answered Dec 03 '21 at 04:31

You can treat the sentence as a tree and loop through it.

entities = nltk.ne_chunk(text)
for c in entities:
    # Is an entity
    if isinstance(elem, nltk.Tree):
        print('elem: ', elem.leaves(), elem.label())
    else:
       # Not an entity

Named Entity Recognition for NLTK in Python. Identifying the NE

6 Answers6

Linked