-1

Trying to convert a nested json to csv

json data

{"LOG_28MAY":[{"pk":"22","venue_name":"manchester","venue_code":"03839",
"fields":{"codename":"L01","name":"Can add log entry","content_type":"8","DAILY_LIST":["LOG_ID:12309","HOST_ID:1293123"]}},{"pk":"23","venue_name":"Birmingham","fields":{"codename":"Edit Log entry","content_type":"9","DAILY_LIST":["LOG_ID:230912309","HOST_ID:2494569","LOG_LOCATION_ID:20190627"]}}]}
bkyada
  • 332
  • 1
  • 9
Srini Sydney
  • 564
  • 8
  • 17
  • Please [edit] your question to also show the output that you expect to get in the `output.csv` file for that JSON data. – Martin Evans Jun 06 '19 at 17:58

1 Answers1

1
import json
import csv

def get_leaves(item, key=None):
    if isinstance(item, dict):
        leaves = {}
        for i in item.keys():
            leaves.update(get_leaves(item[i], i))
        return leaves
    elif isinstance(item, list):
        leaves = {}
        for i in item:
            leaves.update(get_leaves(i, key))
        return leaves
    else:
        return {key : item}


with open('json.txt') as f_input:
    json_data = json.load(f_input)['LOG_28MAY']

# First parse all entries to get the complete fieldname list
fieldnames = set()

for entry in json_data:
    fieldnames.update(get_leaves(entry).keys())

with open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=sorted(fieldnames))
    csv_output.writeheader()
    csv_output.writerows(get_leaves(entry) for entry in json_data)

Output:

fields.DAILY_LIST.HOST_ID,fields.DAILY_LIST.LOG_ID,fields.DAILY_LIST.LOG_LOCATION_ID,fields.codename,fields.content_type,fields.name,pk,venue_code,venue_name
1293123,12309,,L01,8,Can add log entry,22,03839,manchester
2494569,230912309,20190627,Edit Log entry,9,,23,,Birmingham
marsnebulasoup
  • 2,530
  • 2
  • 16
  • 37
Srini Sydney
  • 564
  • 8
  • 17
  • @MartinEvans gave helpful directions in this post https://stackoverflow.com/questions/41180960/convert-nested-json-to-csv-file-in-python/41237287?noredirect=1#comment99491628_41237287 – Srini Sydney Jun 04 '19 at 21:31
  • with the root tag the elements of the array are not retrieved. i.e elements of DAILY LIST are not fully populated. 1 missing from first row and 2 missing from second row. output below – Srini Sydney Jun 04 '19 at 21:35
  • DAILY_LIST,codename,content_type,name,pk,venue_code,venue_name HOST_ID:1293123,L01,8,Can add log entry,22,03839,manchester LOG_LOCATION_ID:20190627,Edit Log entry,9,,23,,Birmingham – Srini Sydney Jun 04 '19 at 21:35
  • if you look closely DAILY_LIST's JSON Objects format is incorrect. – bkyada Jun 04 '19 at 22:53