IIUC - what about the following approach.
It iterates over all campaignResults
, iterates in each one over all results
and writes out campaignID
, metadata.id
and (as an example) metrics.impressions
and metrics.clicks
per row:
import json
sep = '\t'
with open(jsonfile) as jsonin, open('j2c.csv', 'w') as f:
j = json.load(jsonin)
f.write(f'campaignId{sep}metadata.id{sep}metrics.impressions{sep}metrics.clicks\n')
for cR in range(j['totalCampaigns']):
for r in range(j['campaignResults'][cR]['totalResults']):
f.write(j['campaignResults'][cR]['campaignId']+ sep)
f.write(j['campaignResults'][cR]['results'][r]['metadata']['id']+ sep)
f.write(str(j['campaignResults'][cR]['results'][r]['metrics']['impressions']) + sep)
f.write(str(j['campaignResults'][cR]['results'][r]['metrics']['clicks']) + '\n')
result:
# campaignId metadata.id metrics.impressions metrics.clicks
# 0081da282b2dbe8140508074366cac91ba 2019-05-20 176430.0 59.0
# 0081da282b2dbe8140508074366cac91ba 2019-05-19 169031.0 59.0
# 0081da282b2dbe8140508074366cac91ba 2019-05-18 108777.0 62.0
# 0081da282b2dbe8140508074366cac91ba 2019-05-17 272088.0 60.0
# 0081da282b2dbe8140508074366cac91ba 2019-05-16 198100.0 62.0
# ...
# 00c03d801da285767a093d0b4d5188fb34 2018-01-10 0.0 0.0
# 00c03d801da285767a093d0b4d5188fb34 2018-01-09 0.0 0.0
# 00c03d801da285767a093d0b4d5188fb34 2018-01-08 0.0 0.0
# 00c03d801da285767a093d0b4d5188fb34 2018-01-07 0.0 0.0
# 00c03d801da285767a093d0b4d5188fb34 2018-01-06 0.0 0.0
I still do not really understand exactly what data you want to extract - only values which have a date-like pattern or only specific dates?
Besides that, I do not really get the structure of your json file, so I tried to create a tree print out which may help to get a clearer view and to formulate the question more precisely:
with open(file) as f:
j = json.load(f)
def getStructure(dct, ind=''):
indsym = '.\t'
for k, v in dct.items():
if type(v) is list:
print(f'{ind}{k}[{len(v)}]')
getStructure(v[0], ind + indsym)
elif type(v) is dict:
print(f'{ind}{k}')
getStructure(v, ind + indsym)
else:
print(f'{ind}{k}')
getStructure(j)
result:
# campaignResults[2]
# . campaignId
# . results[500]
# . . metadata
# . . . id
# . . . fromDate
# . . . toDate
# . . . lastCappingTime
# . . metrics
# . . . impressions
# . . . clicks
# . . . conversions
# . . . spend
# . . . ecpc
# . . . ctr
# . . . conversionRate
# . . . cpa
# . . . totalValue
# . . . averageValue
# . . . conversionMetrics[6]
# . . . . name
# . . . . conversions
# . . . . conversionRate
# . . . . cpa
# . . . . totalValue
# . . . . averageValue
# . totalResults
# totalCampaigns
a little problem here: I think that there are not always the same keys in similar list elements:
j['campaignResults'][0]['results'][0]['metadata'].keys()
# dict_keys(['id', 'fromDate', 'toDate', 'lastCappingTime'])
j['campaignResults'][1]['results'][0]['metadata'].keys()
# dict_keys(['id', 'fromDate', 'toDate'])
So note that the above getStructure
function only looks into the first element of a list to get the structure of that one.