2

I have a data frame that has as index a timestamp and a column that has list of dictionaries:

    index                   var_A

    2019-08-21 09:05:49    [{"Date1": "Aug 21, 2017 9:09:51 AM","Date2": "Aug 21, 2017 9:09:54 AM","Id": "d5e665e5","num_ins": 108,"num_del": 0, "time": 356} , {"Date1": "Aug 21, 2017 9:09:57 AM","Date2": "Aug 21, 2017 9:09:59 AM","Id": "d5e665e5","num_ins": 218,"num_del": 5, "time": 166}]
    2019-08-21 09:05:59    [{"Date1": "Aug 21, 2017 9:10:01 AM","Date2": "Aug 21, 2017 9:11:54 AM","Id": "d5e665e5","num_ins": 348,"num_del": 72, "time": 3356} , {"Date1": "Aug 21, 2017 9:19:57 AM","Date2": "Aug 21, 2017 9:19:59 AM","Id": "d5e665e5","num_ins": 69,"num_del": 5, "time": 125}, {"Date1": "Aug 21, 2017 9:20:01 AM","Date2": "Aug 21, 2017 9:21:54 AM","Id": "f9e775f9","num_ins": 470,"num_del": 0, "time": 290} ]
    2019-08-21 09:06:04    []

What I wish to achieve is a dataframe like:

    index              Date1                      Date2                    Id      num_ins       num_del    time
2019-08-21 09:05:49   Aug 21, 2017 9:09:51AM   Aug 21, 2017 9:09:54AM   d5e665e5      0           108        356
2019-08-21 09:05:49   Aug 21, 2017 9:09:57AM   Aug 21, 2017 9:09:59AM   d5e665e5      218           5        166
2019-08-21 09:05:59   Aug 21, 2017 9:10:01AM   Aug 21, 2017 9:11:54AM   d5e665e5      348          72       3356
2019-08-21 09:05:59   Aug 21, 2017 9:19:57AM   Aug 21, 2017 9:19:59AM   d5e665e5      69            5        125
2019-08-21 09:05:59   Aug 21, 2017 9:20:01AM   Aug 21, 2017 9:21:54AM   f9e775f9      470           0        290
2019-08-21 09:06:04     NAN                         NAN                    NAN        NAN         NAN        NAN
jessirocha
  • 457
  • 4
  • 15

2 Answers2

1

Loop by each value with enumerate, because duplicated inex values and create DataFrames, then create DataFrame for empty lists and last concat together:

import ast

out = {}
for i, (k, v) in enumerate(df['var_A'].items()):
    df = pd.DataFrame(v)
    if df.empty:
        out[(i, k)] = pd.DataFrame(index=[0], columns=['Id'])
    else:
        out[(i, k)] = df

df = pd.concat(out, sort=True).reset_index(level=[0,2], drop=True)
print (df)
                                       Date1                    Date2  \
2019-08-21 09:05:49  Aug 21, 2017 9:09:51 AM  Aug 21, 2017 9:09:54 AM   
2019-08-21 09:05:49  Aug 21, 2017 9:09:57 AM  Aug 21, 2017 9:09:59 AM   
2019-08-21 09:05:59  Aug 21, 2017 9:10:01 AM  Aug 21, 2017 9:11:54 AM   
2019-08-21 09:05:59  Aug 21, 2017 9:19:57 AM  Aug 21, 2017 9:19:59 AM   
2019-08-21 09:05:59  Aug 21, 2017 9:20:01 AM  Aug 21, 2017 9:21:54 AM   
2019-08-21 09:05:59                      NaN                      NaN   

                           Id  num_del  num_ins    time  
2019-08-21 09:05:49  d5e665e5      0.0    108.0   356.0  
2019-08-21 09:05:49  d5e665e5      5.0    218.0   166.0  
2019-08-21 09:05:59  d5e665e5     72.0    348.0  3356.0  
2019-08-21 09:05:59  d5e665e5      5.0     69.0   125.0  
2019-08-21 09:05:59  f9e775f9      0.0    470.0   290.0  
2019-08-21 09:05:59       NaN      NaN      NaN     NaN  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you for your reply! I am still getting an error. I actually had a typo in my post. My original indexes are unique. I don't know if this could be affecting. But the error says: malformed node or string: – jessirocha Aug 22 '19 at 12:46
  • @jessirocha - What is `print (type(df.loc[0, 'var_A']))` ? – jezrael Aug 22 '19 at 12:47
  • 1
    @jessirocha - if it return list, then change `df = pd.DataFrame(ast.literal_eval(v))` to `df = pd.DataFrame(v)` – jezrael Aug 22 '19 at 12:50
  • I can't print it: cannot do index indexing on with these indexers [0] of – jessirocha Aug 22 '19 at 13:51
  • @jessirocha - And `print (type(df.loc[df.index[0], 'var_A']))` ? – jezrael Aug 22 '19 at 13:52
  • 1
    It works! Before it must have been a bug. It returned a list and I replaced by df = pd.DataFrame(v) as suggested. Thank you so much for your help! – jessirocha Aug 22 '19 at 14:02
1

You can use pandas functions stack and concat to do this.

  1. First use stack to unlist the list the column var_A
  2. Then use concat to unnest the dictionary and put it into seperate columns

You can use the following code to do the same. Assuming your dictionary to be df.

Unlist:

df = df.apply(lambda x: x.apply(pd.Series).stack()).reset_index().drop('level_1', 1)

enter image description here

Unnest:

df = pd.concat([df.drop('var_A', axis=1), df['var_A'].apply(pd.Series)], axis=1).drop(0,1)

enter image description here