0

I hope everyone is doing well.
I have been following SentDex's Youtube tutorial on using NLTK, with the aim of creating a name recognition program. As you can see from the code below, I have managed to 'chunk' names. However, what I would like to do is put all of the 'chunked' names into an array so I can easily select the names. Is this possible? If not is there another way of doing it?

import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer

train_text = state_union.raw("2005-GWBush.txt")
sample_text = state_union.raw("2006-GWBush.txt")

custom_sent_tokenizer = PunktSentenceTokenizer(train_text)

tokenized = custom_sent_tokenizer.tokenize(sample_text)
namedEnt=""
def process_content():
    try:
        for i in tokenized[5:]:
            words = nltk.word_tokenize(i)
            tagged = nltk.pos_tag(words)
            namedEnt = nltk.ne_chunk(tagged,binary=True)
            namedEnt.draw()

    except Exception as e:
        print(str(e))




process_content()
R watt
  • 67
  • 7
  • Is it the same question as https://stackoverflow.com/questions/31836058/nltk-named-entity-recognition-to-a-python-list ? – alvas Nov 13 '17 at 04:21

1 Answers1

0

to get the tags of each sentence you sould use the Tree.pos() and filter that list by second element, 'NE' means named entity.

def process_content():
    names = []
    try:
        for i in tokenized[5:]:
            words = nltk.word_tokenize(i)
            tagged = nltk.pos_tag(words)
            namedEnt = nltk.ne_chunk(tagged,binary=True)
            tags = namedEnt.pos()
            names.append([x[0][0] for x in tags if x[1] == 'NE'])

    except Exception as e:
        print(str(e))
    return names
Darkoob12
  • 114
  • 2
  • 8