0

How to convert nested dictionary into a 2D table

data[0] is a collection of table rows

data[0][0] is a table row,

key:year is the column name,

key:values is the values in the column.

I want to recover the data[0] into tablular form in Pandas dataframe.

I found the json_normalize may help, but not knowing how to do it.

any suggestions ?

Expected format

Raw data in dictionary

(Pdb++) data[0]
    [{u'values': [{u'Actual': u'(0.2)'}, {u'Upper End of Range': u'-'}, {u'Upper End of Central Tendency': u'-'}, {u'Lower End of Central Tendency': u'-'}, {u'Lower End of Range': u'-'}], u'year': u'2009'}, {u'values': [{u'Actual': u'2.8'}, {u'Upper End of Range': u'-'}, {u'Upper End of Central Tendency': u'-'}, {u'Lower End of Central Tendency': u'-'}, {u'Lower End of Range': u'-'}], u'year': u'2010'}, {u'values': [{u'Actual': u'2.0'}, {u'Upper End of Range': u'-'}, {u'Upper End of Central Tendency': u'-'}, {u'Lower End of Central Tendency': u'-'}, {u'Lower End of Range': u'-'}], u'year': u'2011'}, {u'values': [{u'Actual': u'2.0'}, {u'Upper End of Range': u'-'}, {u'Upper End of Central Tendency': u'-'}, {u'Lower End of Central Tendency': u'-'}, {u'Lower End of Range': u'-'}], u'year': u'2012'}, {u'values': [{u'Actual': u'2.5'}, {u'Upper End of Range': u'-'}, {u'Upper End of Central Tendency': u'-'}, {u'Lower End of Central Tendency': u'-'}, {u'Lower End of Range': u'-'}], u'year': u'2013'}, {u'values': [{u'Actual': u'-'}, {u'Upper End of Range': u'3.0'}, {u'Upper End of Central Tendency': u'3.0'}, {u'Lower End of Central Tendency': u'2.8'}, {u'Lower End of Range': u'2.1'}], u'year': u'2014'}, {u'values': [{u'Actual': u'-'}, {u'Upper End of Range': u'3.5'}, {u'Upper End of Central Tendency': u'3.2'}, {u'Lower End of Central Tendency': u'3.0'}, {u'Lower End of Range': u'2.2'}], u'year': u'2015'}, {u'values': [{u'Actual': u'-'}, {u'Upper End of Range': u'3.4'}, {u'Upper End of Central Tendency': u'3.0'}, {u'Lower End of Central Tendency': u'2.5'}, {u'Lower End of Range': u'2.2'}], u'year': u'2016'}, {u'values': [{u'Actual': u'-'}, {u'Upper End of Range': u'2.4'}, {u'Upper End of Central Tendency': u'2.3'}, {u'Lower End of Central Tendency': u'2.2'}, {u'Lower End of Range': u'1.8'}], u'year': u'Longer Run'}]

(Pdb++) data[0][0]
{u'values': [{u'Actual': u'(0.2)'}, {u'Upper End of Range': u'-'}, {u'Upper End of Central Tendency': u'-'}, {u'Lower End of Central Tendency': u'-'}, {u'Lower End of Range': u'-'}], u'year': u'2009'}

Maybe to change the JSON schema will be a better solution ?

if so, what kind of NEW JSON schema design is better for the kind of table data. thanks

user3675188
  • 7,271
  • 11
  • 40
  • 76

2 Answers2

1
import pandas

# set up data structures
columns = [
    "year",
    "actual",
    "upper",
    "upper_central",
    "lower_central",
    "lower"
]
value_getter = {
    "year"         : lambda item: item['year'],
    "actual"       : lambda item: item['values'][0]['Actual'],
    "upper"        : lambda item: item['values'][1]['Upper End of Range'],
    "upper_central": lambda item: item['values'][2]['Upper End of Central Tendency'],
    "lower_central": lambda item: item['values'][3]['Lower End of Central Tendency'],
    "lower"        : lambda item: item['values'][4]['Lower End of Range']
}
mydata = {
    "year"         : [],
    "actual"       : [],
    "upper"        : [],
    "upper_central": [],
    "lower_central": [],
    "lower"        : []
}

# repackage data
for item in data[0]:
    for column in columns:
        mydata[column].append(value_getter[column](item))

# and stuff it into pandas
df = pandas.DataFrame(mydata, columns=columns)

then df.T gives

                   0     1     2     3     4     5     6     7           8
year            2009  2010  2011  2012  2013  2014  2015  2016  Longer Run
actual         (0.2)   2.8   2.0   2.0   2.5     -     -     -           -
upper              -     -     -     -     -   3.0   3.5   3.4         2.4
upper_central      -     -     -     -     -   3.0   3.2   3.0         2.3
lower_central      -     -     -     -     -   2.8   3.0   2.5         2.2
lower              -     -     -     -     -   2.1   2.2   2.2         1.8
Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99
  • Hi, do you think to change the json schema will be a better solution ? if so, what kind of json schema design is better for the kind of table data. thanks~ – user3675188 Feb 05 '15 at 05:04
  • I tried your solution , it works ~thank you so much. and I change the JSON scheme in this format http://stackoverflow.com/questions/28340557/how-to-convert-the-dictionary-into-2d-table-form-in-pandas how to do it ? thanks – user3675188 Feb 05 '15 at 09:40
0

For efficiency you should initialize the data frame but if your data set is small, and if you dont know all the possible strings that occur in the inner-most dictionaries, there is no need to do so.

 import pandas as pd
 df=pd.DataFrame
 for dict1 in data[0]:
     for dict2 in dict1['values']:
         for key,val in zip(dict2.keys(),dict2.values()):
                df.loc[key,dict1['year']]=val
df

enter image description here

derchambers
  • 904
  • 13
  • 19