5

I try to create a dataframe from nested dictionary in my pandas dataframe, but i can't make it work...

my dataframe :

    created_at                  selected
    2019-08-13T12:24:53+00:00   {"982813":false,"1786112":true,"3002218":false}
    2019-08-31T13:47:51+00:00   {"309279":true,"1903384":false}
        ...

And i would like to create a new df with the selected column data formatted as follows :

        created_at                  ID            Value
        2019-08-13T12:24:53+00:00   982813        false    
        2019-08-13T12:24:53+00:00   1786112       true
        2019-08-13T12:24:53+00:00   3002218       false
        2019-08-31T13:47:51+00:00   309279        true
        2019-08-31T13:47:51+00:00   1903384       false
        ...

I've been trying to use explode() and json_normalize() without success so i decided to go for pd.DataFrame.from_dict() and a for loop like as follow but i'm getting an error.

x = {}
for row in df.selected:
    pd.DataFrame.from_dict(row, orient='index')

But i'm getting the following error :

AttributeError: 'str' object has no attribute 'values'

i'm still a beginner in python so if someone as a idea/explanation i'm all ears.

nassim
  • 1,547
  • 1
  • 14
  • 26
MisterOo
  • 53
  • 1
  • 6
  • that's because each `row` is a string, not a `dict`. What's your expected output? – Quang Hoang Sep 20 '19 at 18:37
  • My goal is to get the dictionary out and create a new df. With the following code I just want to output the data to use the id as index and the True/False in column. like this : ```python for row in df.selected: x[row] = pd.DataFrame.from_dict(row, orient='index') final = pd.concat(x.values()) print(final) Index Value 982813 false 1786112 true 3002218 false 309279 true 1903384 false ... ``` – MisterOo Sep 20 '19 at 19:01

3 Answers3

0

Here's a micro example to show you the idea. Not advised though if your volume is large:

import pandas as pd

df = pd.DataFrame([[1, {'abc':11}], [2, {'def':22, 'ghi':33}]], columns=['id', 'dct'])

lst = []

for index, row in df.iterrows():
    for key, value in row['dct'].items():
        lst.append([row['id'], key, value])


new = pd.DataFrame(lst, columns=['id', 'string', 'value'])

print(new)
kjmerf
  • 4,275
  • 3
  • 21
  • 29
0

In your case you can using explode in pandas after 0.25.0

df.BB=df.BB.map(lambda x : list(x.items()))
s=df.explode('BB')
pd.concat([s,pd.DataFrame(s.BB.tolist(),index=s.index)],axis=1)
Out[93]: 
   CC      BB  0  1
0   1  (1, 2)  1  2
0   1  (2, 1)  2  1
1   2  (2, 2)  2  2
1   2  (8, 3)  8  3
1   2  (4, 5)  4  5

Data

df= pd.DataFrame({'CC':[1,2],'BB':[{1:2,2:1},{2:2,8:3,4:5}]})
BENY
  • 317,841
  • 20
  • 164
  • 234
0

You want to use .apply(pd.Series), stack() and then rename your columns:

df.set_index('created_at')['selected'].apply(pd.Series).stack().reset_index().rename(columns={'level_1':'ID',0:'Value'})

                  created_at       ID  Value
0  2019-08-13T12:24:53+00:00   982813  False
1  2019-08-13T12:24:53+00:00  1786112   True
2  2019-08-13T12:24:53+00:00  3002218  False
3  2019-08-31T13:47:51+00:00   309279   True
4  2019-08-31T13:47:51+00:00  1903384  False

By the way, for future reference, you can get answers much faster by providing code to replicate your starting point. Most of the time to do this I spent coming up with this:

df = pd.DataFrame({"created_at": ['2019-08-13T12:24:53+00:00', '2019-08-31T13:47:51+00:00'], "selected": [{"982813":False,"1786112":True,"3002218":False}, {"309279":True,"1903384":False}]})
user1717828
  • 7,122
  • 8
  • 34
  • 59