Trrying to extraxt information using NLTK

Question

I want to extract information from a given news, the news are text like this:

O'Sullivan could run in Worlds

Sonia O'Sullivan has indicated that she would like to participate in next month's World Cross Country Championships in St Etienne.

Athletics Ireland have hinted that the 35-year-old Cobh runner may be included in the official line-up for the event in France on 19-20 March. Provincial teams were selected after last Saturday's Nationals in Santry and will be officially announced this week. O'Sullivan is at present preparing for the London marathon on 17 April. The participation of O'Sullivan, currentily training at her base in Australia, would boost the Ireland team who won the bronze three years agio. The first three at Santry last Saturday, Jolene Byrne, Maria McCambridge and Fionnualla Britton, are automatic selections and will most likely form part of the long-course team. O'Sullivan will also take part in the Bupa Great Ireland Run on 9 April in Dublin.

I tried with this code, that extract the information of the ieer doc 'NYT_19980315'.

IN = re.compile(r'.*\bin\b(?!\b.+ing)')
for doc in nltk.corpus.ieer.parsed_docs('NYT_19980315'):
    for rel in nltk.sem.relextract.extract_rels('ORG', 'LOC', doc,corpus='ieer', pattern = IN):
         print (nltk.sem.relextract.rtuple(rel))

With this code, the output is:

[ORG: 'WHYY'] 'in' [LOC: 'Philadelphia']
[ORG: 'McGlashan &AMP; Sarrail'] 'firm in' [LOC: 'San Mateo']
[ORG: 'Freedom Forum'] 'in' [LOC: 'Arlington']
[ORG: 'Brookings Institution'] ', the research group in' [LOC: 'Washington']
[ORG: 'Idealab'] ', a self-described business incubator based in' [LOC: 'Los Angeles']
[ORG: 'Open Text'] ', based in' [LOC: 'Waterloo']
[ORG: 'WGBH'] 'in' [LOC: 'Boston']
[ORG: 'Bastille Opera'] 'in' [LOC: 'Paris']
[ORG: 'Omnicom'] 'in' [LOC: 'New York']
[ORG: 'DDB Needham'] 'in' [LOC: 'New York']
[ORG: 'Kaplan Thaler Group'] 'in' [LOC: 'New York']
[ORG: 'BBDO South'] 'in' [LOC: 'Atlanta']
[ORG: 'Georgia-Pacific'] 'in' [LOC: 'Atlanta']

The problem is that I am trying to send my text (the news that the user insert), in this case is 002.txt. I am not able to pass the text as a parameter of the nltk.sem.extract_rels() function, it says that AttributeError: 'Tree' object has no attribute 'text', does anyone know how to do it? Is there any way to convert the .txt to an ieer document?

The problem is that I am trying to send my text (the news that the user inserts), in this case it is 002.txt. I am not able to pass the text as a parameter of the function nltk.sem.extract_rels(), it tells me: AttributeError: the object 'Tree' does not have the attribute 'text'. Does anyone know how to do it? Is there any way to convert the .txt to an ieer document?

Now, I am working with this code:

import re
import nltk
import os

with open('002.txt', 'r') as f:
    sample = f.read()


sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)

def extract_entity_names(t):
    entity_names = []

    if hasattr(t, 'label') and t.label:
        if t.label() == 'NE':
            entity_names.append(' '.join([child[0] for child in t]))
        else:
            for child in t:
                entity_names.extend(extract_entity_names(child))

    return entity_names


entity_names = []
for tree in chunked_sentences:    
    entity_names.extend(extract_entity_names(tree))

# Print unique entity names
print(set(entity_names))

The output is:

{'London', 'Australia', 'Worlds Sonia', 'Santry', 'Bupa Great Ireland Run', 'Maria', 'France', 'Fionnualla Britton', 'Dublin', 'Jolene Byrne', 'Ireland'}

And tried to add this lines nltk.sem.extract_rels():

for rel in nltk.sem.extract_rels('ORG', 'LOC', doc, corpus='ieer', pattern = IN):
    print(nltk.sem.rtuple(rel))

What do you mean send text? Where do you want to send the text? Do you want to transmit to another device, output to a file, or reformat text in some special way? It is very difficult to answer your question without seeing the data that produces your problem. Please read about how to ask a good question and try to post a [Minimal Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example "Minimal Reproducible Example") so we can better help you. — itprorh66, May 22 '21 at 13:58
@itprorh66 Hi thanks for the help, I change the post trying to be more specific. Now im using this function: nltk.sem.relextract.extract_rels(), which extracts the information from the ieer document 'NYT_19980315', which is included in the library. I want to use the function with the news that the user inserts. I hope i can explain myself. Thanks again for the help ^^. — EnriqueMM, May 22 '21 at 20:42
please update your question and post the entire traceback error message as code in your question/ — itprorh66, May 23 '21 at 15:33
Does this help [How to flatten the parse tree and store in a string for further string operations python nltk](https://stackoverflow.com/questions/28704060/how-to-flatten-the-parse-tree-and-store-in-a-string-for-further-string-operation) — itprorh66, May 23 '21 at 15:38

Trrying to extraxt information using NLTK

0 Answers0