1

This is my first time accessing an API / working with json data so if anyone can point me towards a good resource for understanding how to work with it I'd really appreciate it.

Specifically though, I have json data in this form:

{"result": { "code": "OK", "msg": "" },"report_name":"DAILY","columns":["ad","ad_impressions","cpm_cost_per_ad","cost"],"data":[{"row":["CP_CARS10_LR_774470","966","6.002019","5.797950"]}],"total":["966","6.002019","5.797950"],"row_count":1}

I understand this structure but I don't know how to get it into a DataFrame properly.

bpr
  • 483
  • 1
  • 11
  • 23
  • Could you post the whole JSON please. – ajsp Jul 24 '15 at 17:53
  • 1
    You might be better off parsing out the column names and data and creating a dataframe yourself, rather than use the pd.read_json class. So parse out json['columns'] and json['data'][0]['row'] for columns and rows respectively, then construct your data frame as you would normally. – ajsp Jul 24 '15 at 18:29

2 Answers2

1

Looking at the structure of your json, presumably you will have several rows for your data and in my opinion it will make more sense to build the dataframe yourself.

This code uses columns and data to build a dataframe:

In [12]:

import json
import pandas as pd
​
with open('... path to your json file ...') as fp:
    for line in fp:
        obj = json.loads(line)
        columns = obj['columns']
        data = obj['data']
        l = []
        for d in data:
            l += [d['row']]
        df = pd.DataFrame(l, index=None, columns=columns)

df
Out[12]:
ad  ad_impressions  cpm_cost_per_ad cost
0   CP_CARS10_LR_774470 966 6.002019    5.797950

As for the rest of the data, in your json, I guess you could e.g. use the totals for checking your dataframe,

In [14]:

sums = df.sum(axis=0)
obj['total']
for i in range(0,3):
    if (obj['total'][i] != sums[i+1]):
        print "error in total"

In [15]:

if obj['row_count'] != len(df.index):
    print "error in row count"

As for the rest of the data in the json, it is difficult for me to know if anything else should be done.

Hope it helps.

lrnzcig
  • 3,868
  • 4
  • 36
  • 50
0

Check pandas documentation. Specifically,

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html

Pandas supports reading json to dataframe

nitin
  • 7,234
  • 11
  • 39
  • 53
  • I get an error with pd.read_json(response) stating "arrays must all be the same length". The problem is that there is a final row, "Total" which has excludes ad names, I don't see any documentation on how to handle this from the above link – bpr Jul 24 '15 at 18:08