Converting array of arrays into flattened dataframe

Question

Got a pandas dataframe with the below structure

0    [{'review_id': 4873356, 'rating': '5.0'}, {'review_id': 4973356, 'rating': '4.0'}]
1    [{'review_id': 4635892, 'rating': '5.0'}, {'review_id': 4645839, 'rating': '3.0'}] 
....
....

I would like to flatten into a dataframe with the following columns review_id and rating

I was trying out pd.DataFrame(df1.values.flatten()) but looks like I'm getting something basic not right, need help!!!

This is the original json { "status": true, "total_reviews": 323, "data": [ { "review_id": 4873356, "rating": "5.0" }, { "review_id": 4854192, "rating": "5.0" }, { "review_id": 4848567, "rating": "5.0" } ] } — Somasundaram Sekar, Feb 11 '19 at 18:13

ALollz · Accepted Answer · 2019-02-11T18:16:00.077

3

You wind up getting an array of lists of dicts so need:

import pandas as pd
pd.DataFrame([x for y in df1.values for x in y])

  rating  review_id
0    5.0    4873356
1    4.0    4973356
2    5.0    4635892
3    3.0    4645839

Or if willing to use itertools:

from itertools import chain

pd.DataFrame(chain.from_iterable(df1.values.ravel()))

edited Feb 11 '19 at 18:16

answered Feb 11 '19 at 18:09

ALollz

57,915
7
66
89

score 2 · Answer 2 · answered Feb 11 '19 at 18:13

1st unnesting , then re build your dataframe (assuming you have columns name 0)

pd.DataFrame(unnesting(df,[0])[0].values.tolist())
Out[61]: 
  rating  review_id
0    5.0    4873356
1    4.0    4973356
2    5.0    4635892
3    3.0    4645839

def unnesting(df, explode):
    idx=df.index.repeat(df[explode[0]].str.len())
    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
    df1.index=idx
    return df1.join(df.drop(explode,1),how='left')

Converting array of arrays into flattened dataframe

2 Answers2

Linked