1

I have a dict with finance data that I want to convert to pd.DataFrame.

The data looks like:

{u'candles': [{u'complete': True,
   u'mid': {u'c': u'1.19228',
    u'h': u'1.19784',
    u'l': u'1.18972',
    u'o': u'1.19581'},
   u'time': u'2018-05-06T21:00:00.000000000Z',
   u'volume': 119139},
  {u'complete': False,
   u'mid': {u'c': u'1.18706',
    u'h': u'1.19388',
    u'l': u'1.18614',
    u'o': u'1.19239'},
   u'time': u'2018-05-07T21:00:00.000000000Z',
   u'volume': 83259}],
 u'granularity': u'D',
 u'instrument': u'EUR_USD'}

It's a bit tricky since I want to have a dataframe that has these fields:

c h l o time volume
1.19228 1.19784 1.18972 1.19581 2018-05-06T21:00:00.000000000Z  119139
1.18706 1.19388 1.18614 1.19239 2018-05-07T21:00:00.000000000Z  83259

I have tried various combos like:

pd.DataFrame(dict['candles'])
 pd.DataFrame([dict['candles']])

but it does not seem possible to convert to desired format without transforming the dict

jpp
  • 159,742
  • 34
  • 281
  • 339
dared
  • 15
  • 6
  • 1
    [pd.read_json](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html)/ [json_normalize](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.json.json_normalize.html) and see all the duplicate questions on "Import JSON into pandas". Please close thie question as duplicate – smci May 08 '18 at 12:31
  • There's also [pd.from_dict](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.from_dict.html) but that's probably not what you want – smci May 08 '18 at 12:33

1 Answers1

1
data = input_data['candles']
df = pd.DataFrame(data=data, columns=['mid', 'time', 'volume'])

Now you have mid as a column with nested fields, not what you want.

                                                 mid                            time  volume
0  {'c': '1.19228', 'h': '1.19784', 'l': '1.18972...  2018-05-06T21:00:00.000000000Z  119139
1  {'c': '1.18706', 'h': '1.19388', 'l': '1.18614...  2018-05-07T21:00:00.000000000Z   83259

You can use .apply(pd.Series) to flatten the nested mid structure

unnested = df['mid'].apply(pd.Series)
df = df.join(unnested).drop(columns='mid')

results:

                             time  volume        c        h        l        o
0  2018-05-06T21:00:00.000000000Z  119139  1.19228  1.19784  1.18972  1.19581
1  2018-05-07T21:00:00.000000000Z   83259  1.18706  1.19388  1.18614  1.19239

EDIT: As pointed out by @smci, json_normalize would also do the trick

from pandas.io.json import json_normalize
json_normalize(data)

but column names are hierarchical

   complete    mid.c    mid.h    mid.l    mid.o  \
0      True  1.19228  1.19784  1.18972  1.19581
1     False  1.18706  1.19388  1.18614  1.19239

                             time  volume
0  2018-05-06T21:00:00.000000000Z  119139
1  2018-05-07T21:00:00.000000000Z   83259
Susensio
  • 820
  • 10
  • 19