15

I'd like to store JSON data in a Python Pandas DataFrame

my JSON data is a dict of dicts of dicts like this

d = {
  "col1": {
    "row1": {
      "data1": "0.87", 
      "data2": "Title col1", 
      "data3": "14.4878", 
      "data4": "Title row1"
    }, 
    "row2": {
      "data1": "15352.3", 
      "data2": "Title col1", 
      "data3": "14.9561", 
      "data4": "Title row2"
    }, 
    "row3": {
      "data1": "0", 
      "data2": "Title col1", 
      "data3": "16.8293", 
      "data4": "Title row3"
    }
  }, 
  "col2": {
    "row1": {
      "data1": "0.87", 
      "data2": "Title col2", 
      "data3": "24.4878", 
      "data4": "Title row1"
    }, 
    "row2": {
      "data1": "15352.3", 
      "data2": "Title col2", 
      "data3": "24.9561", 
      "data4": "Title row2"
    }, 
    "row3": {
      "data1": "0", 
      "data2": "Title col2", 
      "data3": "26.8293", 
      "data4": "Title row3"
    }
  }
}

I did this to put my data in a DataFrame

import pandas as pd
df=pd.DataFrame(d)

I get this

In [1]: df
Out[1]: 
                                                   col1                                               col2
row1  {'data4': 'Title col1', 'data1': '0.87', 'data3':  {'data4': 'Title col1', 'data1': '0.87', 'data3':
row2  {'data4': 'Title col2', 'data1': '15352.3', 'data  {'data4': 'Title col2', 'data1': '15352.3', 'data
row3  {'data4': 'Title col3', 'data1': '0', 'data3': '1  {'data4': 'Title col3', 'data1': '0', 'data3': '2

My problem is that my DataFrame contains dicts instead of values.

I wonder how I can manage multidimensionnal data (more than 2 dimensions... 3 dimensions here) with a Pandas DataFrame.

Each dict inside DataFrame have the same keys.

scls
  • 16,591
  • 10
  • 44
  • 55

1 Answers1

18
df = pd.Panel.from_dict(d).to_frame()

output:

                   col1        col2
major minor                        
data1 row1         0.87        0.87
      row2      15352.3     15352.3
      row3            0           0
data2 row1   Title col1  Title col2
      row2   Title col1  Title col2
      row3   Title col1  Title col2
data3 row1      14.4878     24.4878
      row2      14.9561     24.9561
      row3      16.8293     26.8293
data4 row1   Title row1  Title row1
      row2   Title row2  Title row2
      row3   Title row3  Title row3

If you don't want use Panel:

pd.concat(map(pd.DataFrame, d.itervalues()), keys=d.keys()).stack().unstack(0)
HYRY
  • 94,853
  • 25
  • 187
  • 187
  • Thanks a lot... I didn't know the concept of "Panel". But I think that it's limited to 3 dimensions (like in my case). I will be nice if you could confirm or invalidate this! – scls Mar 16 '13 at 22:38
  • Yes `Panel` is 3D, and there is a `Panel4D` for 4D data. more than 2 dimensions data are always use MultiIndex DataFrame. – HYRY Mar 16 '13 at 22:45
  • Thanks but I would like to have 'data1', ... 'data4' as index of "first level", and that's not the case here – scls Mar 16 '13 at 23:11
  • What is the mean: to have 'data1', ... 'data4' as index of "first level"? Can you give an example? – HYRY Mar 16 '13 at 23:13
  • I just need in fact numerical values (stored in data1 and data3). I need to make some calculus with data (such as calculate sum of data1 and mean of data3) – scls Mar 17 '13 at 07:18
  • The solution without Panel `df = pd.concat(map(pd.DataFrame, d.itervalues()), keys=d.keys()).stack().unstack(0)` if ok... I just need to transposate DataFrame with `df = df.T` and 'data1'...'data4' are "first level" index. So I can get a DataFrame which contains only data1 with `df['data1']` – scls Mar 17 '13 at 07:31
  • 1
    Solution with Panel gives: "Panel is deprecated and will be removed in a future version." Solution without Panel gives: "AttributeError: 'dict' object has no attribute 'itervalues'" – Melkor.cz Mar 30 '20 at 13:57