2

I am trying to perform entity analysis on text and I want to put the results in a dataframe. Currently the results are not stored in a dictionary, nor in a Dataframe. The results are extracted with two functions.

df:

ID    title    cur_working    pos_arg         neg_arg                             date
132   leave    yes            good coffee     management, leadership and salary   13-04-2018
145   love it  yes            nice colleagues long days                           14-04-2018

I have the following code:

result = entity_analysis(df, 'neg_arg', 'ID')

#This code loops through the rows and calls the function entities_text()
def entity_analysis(df, col, idcol):
    temp_dict = {}
    for index, row in df.iterrows():
        id = (row[idcol])
        x = (row[col])
        entities = entities_text(x, id)
        #temp_dict.append(entities)
    #final = pd.DataFrame(columns = ['id', 'name', 'type', 'salience'])
    return print(entities)

def entities_text(text, id):
    """Detects entities in the text."""
    client = language.LanguageServiceClient()
    ent_df = {}
    if isinstance(text, six.binary_type):
        text = text.decode('utf-8')

    # Instantiates a plain text document.
    document = types.Document(
        content=text,
        type=enums.Document.Type.PLAIN_TEXT)

    # Detects entities in the document.
    entities = client.analyze_entities(document).entities

    # entity types from enums.Entity.Type
    entity_type = ('UNKNOWN', 'PERSON', 'LOCATION', 'ORGANIZATION',
                   'EVENT', 'WORK_OF_ART', 'CONSUMER_GOOD', 'OTHER')

    for entity in entities:
        ent_df[id] = ({
            'name': [entity.name],
            'type': [entity_type[entity.type]],
            'salience': [entity.salience]
        })
    return print(ent_df)

This code gives the following outcome:

{'132': {'name': ['management'], 'type': ['OTHER'], 'salience': [0.16079013049602509]}}
{'132': {'name': ['leadership'], 'type': ['OTHER'], 'salience': [0.05074194446206093]}}
{'132': {'name': ['salary'], 'type': ['OTHER'], 'salience': [0.27505040168762207]}}
{'145': {'name': ['days'], 'type': ['OTHER'], 'salience': [0.004272154998034239]}}

I have created temp_dict and a final dataframe in the function entity_analysis(). This thread explained that appending to a dataframe in a loop is not efficient. I don't know how to populate the dataframe in an efficient way. These threads are related to my question but they explain how to populate a Dataframe from existing data. When I try to use temp_dict.update(entities) and return temp_dict I get an error:

in entity_analysis temp_dict.update(entities) TypeError: 'NoneType' object is not iterable

I want the output to be like this:

ID          name                  type                salience
132         management            OTHER               0.16079013049602509 
132         leadership            OTHER               0.05074194446206093 
132         salary                OTHER               0.27505040168762207 
145         days                  OTHER               0.004272154998034239 
jpp
  • 159,742
  • 34
  • 281
  • 339
Dennis Loos
  • 113
  • 2
  • 9
  • 1
    Have you just changed your question from "Currently the results **are not** stored in a dictionary" to "Currently the results **are** stored in a dictionary" ? That's a pretty big change to your original question, I think it's fair it should be rolled back since there's already an answer. – jpp Jun 26 '18 at 10:21

1 Answers1

2

One solution is to create a list of lists via your entities iterable. Then feed your list of lists into pd.DataFrame:

LoL = []

for entity in entities:
    LoL.append([id, entity.name, entity_type[entity.type], entity.salience])

df = pd.DataFrame(LoL, columns=['ID', 'name', 'type', 'salience'])

If you also need the dictionary in the format you currently produce, then you can add your current logic to your for loop. However, first check whether you need to use two structures to store identical data.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • Thanks for answering, I have found a solution. I have created multiple dictionaries, because I had to iterate over the text and the words. Apologies for the confusion. – Dennis Loos Jun 27 '18 at 04:39