1

I have a pandas dataframe with multiple headers. I am wondering how to convert this into a list of nested directories. Each row in the Pandas dataframe will be a nested dictionary in the list.

Here is an example

#Creaet an example multiheader dataframe

col =['id','x, single room','x, double room','y, single room','y, double room' ]
df = pd.DataFrame([[1,2,3,4,5], [3,4,7,5,3]], columns=col)
a = df.columns.str.split(', ', expand=True).values
#swap values in NaN and replace NAN to ''
df.columns = pd.MultiIndex.from_tuples([('', x[0]) if pd.isnull(x[1]) else x for x in a])
df

result

        x   y
id  single room double room single room double room
0   1   2   3   4   5
1   3   4   7   5   3

This is the dataframe I want to convert to a list of nested dictionaries. So this is the desired result

[{'id': 1,
  'x': {'double room': 3, 'single room': 2},
  'y': {'double room': 5, 'single room': 4}},
 {'id': 3,
  'x': {'double room': 7, 'single room': 4},
  'y': {'double room': 3, 'single room': 5}}]

In the code below, I directly create this list.

firstDict = { 'id':1, 'x':{'single room':2, 'double room':3}, 'y':{'single room':4, 'double room':5} }
secondDict = { 'id':3, 'x':{'single room':4, 'double room':7}, 'y':{'single room':5, 'double room':3} }
dictList = []
dictList.append( firstDict )
dictList.append( secondDict )
dictList

[{'id': 1,
  'x': {'double room': 3, 'single room': 2},
  'y': {'double room': 5, 'single room': 4}},
 {'id': 3,
  'x': {'double room': 7, 'single room': 4},
  'y': {'double room': 3, 'single room': 5}}]

So in summary, how to I convert the dataframe df to what dictList is.

Edit:

This is a minimal example, the solution I am looking for should generalize to longer numbers of headers.

SantoshGupta7
  • 5,607
  • 14
  • 58
  • 116

5 Answers5

6

I don't think there is a straight forward way of doing this, that being said, you could use stack + to_dict and some post-processing afterwards:

# prepare the DataFrame
df = df.set_index(('', 'id')).stack(level=0)
df.index.names = ['id', None]

# convert to a dicts of dicts
d = {}
for (idi, key), values in df.to_dict('index').items():
    d.setdefault(idi, {}).update({key: values})

# convert d to list of dicts
result = [{'id': k, **values} for k, values in d.items()]

Output

[{'id': 1,
  'x': {'double room': 3, 'single room': 2},
  'y': {'double room': 5, 'single room': 4}},
 {'id': 3,
  'x': {'double room': 7, 'single room': 4},
  'y': {'double room': 3, 'single room': 5}}]
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • I am wondering how to apply `df = df.set_index(('', 'id')).stack(level=0) ` in cases where the column to set index also has multiple headers. In my case, the top level column name is 'ID', and 2nd level column name is 'New'. I tried `df2 = l[51].set_index(('ID', 'New')).stack(level=0)` but got `KeyError: ('ID', 'New')` – SantoshGupta7 Oct 19 '19 at 21:06
  • 1
    That must be because there is no column named liked that, try to take a look at your column name – Dani Mesejo Oct 19 '19 at 21:12
2

Unsure of how long your number of headers can be, currently it is in a state of easily being coded by hand, as given below -

dct = []

for x in df.values:
  nd = {
          "id": x[0],
          "x": {
                  "single room": x[1],
                  "double room": x[2]
               },
          "y": {
                  "single room": x[3],
                  "double room": x[4]
               }
       }
  dct.append(nd)

Do let me know if there are a high number of headers and the code needs to handle them without explicit typing in.

xprilion
  • 569
  • 1
  • 5
  • 14
1

Something like this?

import pandas as pd
col =['id','x, single room','x, double room','y, single room','y, double room' ]
df = pd.DataFrame([[1,2,3,4,5], [3,4,7,5,3]], columns=col)
a = df.columns.str.split(', ', expand=True).values
#swap values in NaN and replace NAN to ''
df.columns = pd.MultiIndex.from_tuples([('', x[0]) if pd.isnull(x[1]) else x for x in a])
print(df)

dict_list = []
for index, row in df.iterrows():
    d = {}
#    _dict [row["id"]]
    print(type(row), row)#, row.select(1, axis = 0) )
    d["id"] = row[0]
    d["x"] = {'single room':row[1], 'double room':row[1]}
    d["y"] = {'single room':row[3], 'double room':row[4]}
    dict_list.append(d)

print(dict_list)

OUTPUT:

[{'id': 1, 
  'x': {'single room': 2, 'double room': 2}, 
  'y': {'single room': 4, 'double room': 5}
  }, 
{'id': 3, 
  'x': {'single room': 4, 'double room': 4}, 
  'y': {'single room': 5, 'double room': 3}
}
]
RightmireM
  • 2,381
  • 2
  • 24
  • 42
1

you could use either

l = []
d = None
for i, row in df.iterrows():
    for (i1,i2),v in row.iteritems():
        if i2 == 'id':
            d = {i2:v}
            l.append(d)
            continue
        try:
            d[i1][i2]=v
        except KeyError:
            d[i1] = {i2:v}

or if you're ok with a slight modification of your expected result:

from collections import defaultdict
l =[]
for i, row in df.iterrows():
    d = defaultdict(dict)
    for (i1,i2),v in row.iteritems():
        if i2 == 'id':
            d[i2][v]=v
        else:
            d[i1][i2]=v
    l.append(dict(d))

which outputs:

[{'id': {1: 1},
  'x': {'single room': 2, 'double room': 3},
  'y': {'single room': 4, 'double room': 5}},
 {'id': {3: 3},
  'x': {'single room': 4, 'double room': 7},
  'y': {'single room': 5, 'double room': 3}}]
DrD
  • 551
  • 2
  • 8
1

I like accepted solution, but here my two alternatives wihout stacking.

This solution is straight-forward but with more columns a lot of repetition and error-prone:

lst = [{'id': d[('', 'id')], 
        'x': {'single room': d[('x', 'single room')], 'double room': d[('x', 'double room')]},
        'y': {'single room': d[('y', 'single room')], 'double room': d[('y', 'double room')]},}
        for d in df.to_dict('records')
]

Let's try make it more scalable, from Arbitrarily nested dictionary from tuples you can get nest function:

def nest(d: dict) -> dict:
    result = {}
    for key, value in d.items():
        target = result
        for k in key[:-1]:  
            target = target.setdefault(k, {})
        target[key[-1]] = value
    return result

But for ('', id) we need slighlty different behavior:

def nest_m(d: dict) -> dict:
    result = {}
    for key, value in d.items():
        if key == ('', 'id'):
            result['id'] = value
        else:
            target = result
            for k in key[:-1]:  
                target = target.setdefault(k, {})
            target[key[-1]] = value    
    return result

Final line:

lst = [nest_m(d) for d in df.to_dict('records')]  

Output:

[{'id': 1,
  'x': {'single room': 2, 'double room': 3},
  'y': {'single room': 4, 'double room': 5}},
 {'id': 3,
  'x': {'single room': 4, 'double room': 7},
  'y': {'single room': 5, 'double room': 3}}]
Quant Christo
  • 1,275
  • 9
  • 23