5

I have namedtuples defined as follows:

In[37]: from collections import namedtuple
        Point = namedtuple('Point', 'x y')

The nested dictionary has the following format:

In[38]: d
Out[38]: 
{1: {None: {1: Point(x=1.0, y=5.0), 2: Point(x=4.0, y=8.0)}},
2: {None: {1: Point(x=45324.0, y=24338.0), 2: Point(x=45.0, y=38.0)}}}

I am trying to create a pandas dataframe from the dictionary d without having to do for loops.

I have succeeded in creating the dataframe from a subset of the dictionary by doing this:

In[40]: df=pd.DataFrame(d[1][None].values())

In[41]: df

Out[41]: 
   x  y
0  1  5
1  4  8

But i want to be able to create the dataframe from the entire dictionary.

I want the dataframe to output the following (i am using multi index notation):

In[42]: df
Out[42]:
Subcase Step ID  x       y
1       None 1   1.0     5.0
             2   4.0     8.0
2       None 1   45324.0 24338.0
             2   45.0    38.0

The from_dict method of DataFrame, only supports up to two levels of nesting, so i was not able to use it. I am also considering modifying the structure of the d dictionary to achieve my goal. Furthermore, maybe it does not have to be a dictionary.

Thank you.

snowleopard
  • 717
  • 8
  • 19
  • You say it doesn't have to be a dict - what's the source of the data in the dict? Or were you referring to transforming the dict into an intermediate structure before turning it into a dataframe? – Jeff Jul 08 '16 at 20:36
  • The source of the data is from a binary file. It is transformed to a dict for ease of access and fast querying. It would ideally remain a dict. What i was trying to say is that i could alter the code that changes the binary to a dict and use something that is more friendly to pandas. Transforming the dict seems to be inefficient. – snowleopard Jul 08 '16 at 20:42

2 Answers2

2

There are already several answers to similar questions on SO (here, here, or here). These solutions can be adapted to this problem as well. However, none of them is really general to be run on an arbitrary dict. So I decided to write something more universal.

This is a function that can be run on any dict. The dict has to have the same number of levels (depth) at any of its elements, otherwise it will most probably raise.

def frame_from_dict(dic, depth=None, **kwargs):
    def get_dict_depth(dic):
        if not isinstance(dic, dict):
            return 0
        for v in dic.values():
            return get_dict_depth(v) + 1

    if depth is None:
        depth = get_dict_depth(dic)

    if depth == 0:
        return pd.Series(dic)
    elif depth > 0:
        keys = []
        vals = []
        for k, v in dic.items():
            keys.append(k)
            vals.append(frame_from_dict(v, depth - 1))
        try:
            keys = sorted(keys)
        except TypeError:
            # unorderable types
            pass
        return pd.concat(vals, axis=1, keys=keys, **kwargs)

    raise ValueError("depth should be a nonnegative integer or None")

I sacrificed a namedtuple case from this question for the generality. But it can be tweaked if needed.

In this particular case, it can be applied as follows:

df = frame_from_dict(d, names=['Subcase', 'Step', 'ID']).T
df.columns = ['x', 'y']
df
Out[115]: 
                       x        y
Subcase Step ID                  
1       NaN  1       1.0      5.0
             2       4.0      8.0
2       NaN  1   45324.0  24338.0
             2      45.0     38.0
Community
  • 1
  • 1
ptrj
  • 5,152
  • 18
  • 31
  • Thank you for this, it worked like a charm. I was aware of this solution but i was specifically trying to avoid using for loops, since i can control what the definition of the dictionary. I decided to flatten the keys into a tuple. See solution below – snowleopard Jul 15 '16 at 14:18
  • @snowleopard I see. Do you have a general method of flattening keys of a nested dictionary to tuples? I thought this was the crux of the problem. – ptrj Jul 15 '16 at 15:03
  • Yes you are correct, but i am creating the dictionary from a binary file, so i can control how the dictionary is defined. – snowleopard Jul 15 '16 at 17:59
  • Ah, all right. Creating tuples directly is then a better approach. – ptrj Jul 15 '16 at 18:33
0

I decided to flatten the keys into a tuple (tested using pandas 0.18.1):

In [5]: from collections import namedtuple

In [6]: Point = namedtuple('Point', 'x y')

In [11]: from collections import OrderedDict

In [14]: d=OrderedDict()

In [15]: d[(1,None,1)]=Point(x=1.0, y=5.0)

In [16]: d[(1,None,2)]=Point(x=4.0, y=8.0)

In [17]: d[(2,None,1)]=Point(x=45324.0, y=24338.0)

In [18]: d[(2,None,2)]=Point(x=45.0, y=38.0)

Finally,

In [7]: import pandas as pd

In [8]: df=pd.DataFrame(d.values(),  index=pd.MultiIndex.from_tuples(d.keys(), names=['Subcase','Step','ID']))


In [9]:df
Out[9]: 
                       x        y
Subcase Step ID                  
1       NaN  1       1.0      5.0
             2       4.0      8.0
2       NaN  1   45324.0  24338.0
             2      45.0     38.0
snowleopard
  • 717
  • 8
  • 19