Pandas Dataframe group by, column with a list

Question

Im using jupyter notebooks, my current dataframe looks like the following:

players_mentioned  |  tweet_text    |  polarity
______________________________________________
[Mane, Salah]      |  xyz           |    0.12
[Salah]            |  asd           |    0.06

How can I group all players individually and average their polarity?

Currently I have tried to use:

df.groupby(df['players_mentioned'].map(tuple))['polarity'].mean()

But this will return a dataframe grouping all the mentions when together as well as separate, how best can I go about splitting the players up and then grouping them back together.

An expected output would contain

 player  | polarity_average
____________________________
  Mane   |   0.12
  Salah  |   0.09

In other words how to group by each item in the lists in every row.

Your attempted code is not even a close approximation of what you're trying to do. Can you explain what you mean by "splitting them up"? It would be helpful to see your expected output. — cs95, Apr 01 '19 at 20:02
Added an expected output @coldspeed . I understand my attempt is wrong that is why I need some guidance — , Apr 01 '19 at 20:07
Can you please run `result.loc[result['players_mentioned'].str.contains('Alderweireld'), 'players_mentioned'].tolist()` on the resultant df from my code, and tell me what the output is? — cs95, Apr 01 '19 at 22:18
I see the issue, there is a space at the start of the name, thanks — , Apr 02 '19 at 00:04

score 0 · Answer 1 · answered Apr 01 '19 at 20:13

0

If you are just looking to group by players_mentioned and get the averatge for that players popularity score this should do it.

df.groupby('players_mentioned').polarity.agg('mean')

answered Apr 01 '19 at 20:13

Michael

749
1
8
22

This does not produce OP's expected output. – cs95 Apr 01 '19 at 20:16

score 0 · Accepted Answer · answered Apr 01 '19 at 20:14

you can use the unnesting idiom from this answer.

def unnesting(df, explode):
    idx = df.index.repeat(df[explode[0]].str.len())
    df1 = pd.concat([
        pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
    df1.index = idx

    return df1.join(df.drop(explode, 1), how='left')

You can now call groupby on the unnested "players_mentioned" column.

(unnesting(df, ['players_mentioned'])
    .groupby('players_mentioned', as_index=False).mean())

  players_mentioned  polarity
0              Mane      0.12
1             Salah      0.09

Thank you so much! Worked perfectly :) – Apr 01 '19 at 20:23 — , Apr 01 '19 at 20:23

Pandas Dataframe group by, column with a list

2 Answers2