0

I have a dataframe in which one column I have a list of dictionaries, and in this dictionaries I has the name of the of the columns I want to create and its value.

    id  stats                                                            opta_id
0  1307  [{name: 'speed', value: 5},{name: 'strength', value: 10}....]   p176278
1  2410  [{name: 'vision', value: 5}, {name: 'strength', value: 10}....] p118335
2   200  [{name: 'speed', value: 5},{name: 'vision', value: 10}....]     p92187
3  3314  [{name: 'speed', value: 5},{name: 'strength', value: 10}....]   p154976
4  9223  [{name: 'speed', value: 5},{name: 'strength', value: 10}....]   p446990

the list can have up to 80 elements and the length of it is different on each row.

How could flat this column in order to get something similar to this?

    id  stats.speed   stats.strength   stats.vision     .....              opta_id
0  1307  5              10                nan           .....              p176278
1  2410  nan            5                 10            .....              p118335
.
.
.

thank you!

  • Does this answer your question? [Splitting dictionary/list inside a Pandas Column into Separate Columns](https://stackoverflow.com/questions/38231591/splitting-dictionary-list-inside-a-pandas-column-into-separate-columns) – luigigi Mar 11 '20 at 15:43
  • no because using an apply series will give a column of each element of the list, creating columns of dictionaries – Capi Hidalgo Mar 11 '20 at 15:47

2 Answers2

1

Here I would first build a temporary dataframe from a list of dict created from the stats column, and then concat it with the remaining columns:

tmp = pd.DataFrame([{d['name']: d['value'] for d in row}
                    for row in df['stats']]).rename(
                        columns=lambda x: 'stats.' + x)

df = pd.concat([df['id'], tmp, df['opta_id']], axis=1)

With the shown data, it gives:

     id  stats.speed  stats.strength  stats.vision  opta_id
0  1307          5.0            10.0           NaN  p176278
1  2410          NaN            10.0           5.0  p118335
2   200          5.0             NaN          10.0   p92187
3  3314          5.0            10.0           NaN  p154976
4  9223          5.0            10.0           NaN  p446990
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
0

At the end I have found a solution to my problem. First I created at temporal dataframe with every row of the column

tmp = pd.concat([pd.DataFrame(x) for x in df['stats']], keys=df.index).reset_index(level=1, drop=True)

afterwards, I do pivot_table with the 'name' column and using as value the stat

pivot = pd.pivot_table(tmp, columns='name', index=df_stats.index.values, values='stat')