0

I am trying to convert a JSON file to CSV format using Python. I am using JSON.loads() function and then using json_normalize() to flatten the objects. I was wondering if there is better way of doing this.

this is the input file, one row form it:

{"ID": "02","Date": "2019-08-01","Total": 400,"QTY": 12,"Item": [{"NM": "0000000001","CD": "item_CD1","SRL": "25","Disc": [{"CD": "discount_CD1","Amount": 2}],"TxLns": {"TX": [{"TXNM": "000001-001","TXCD": "TX_CD1"}]}},{"NM": "0000000002","CD": "item_CD2","SRL": "26","Disc": [{"CD": "discount_CD2","Amount": 4}],"TxLns": {"TX": [{"TXNM": "000002-001","TXCD": "TX_CD2"}]}},{"NM": "0000000003","CD": "item_CD3","SRL": "27"}],"Cust": {"CustID": 10,"Email": "01@abc.com"},"Address": [{"FirstName": "firstname","LastName": "lastname","Address": "address"}]}

Code

import json
import pandas as pd
from pandas.io.json import json_normalize
data_final=pd.DataFrame()
with open("sample.json") as f:
    for line in f:
        json_obj = json.loads(line)
        ID = json_obj['ID']
        Item = json_obj['Item']
        dataMain = json_normalize(json_obj)
        dataMain=dataMain.drop(['Item','Address'], axis=1)
        #dataMain.to_csv("main.csv",index=False)
        dataItem = json_normalize(json_obj,'Item',['ID'])
        dataItem=dataItem.drop(['Disc','TxLns.TX'],axis=1)
        #dataItem.to_csv("Item.csv",index=False)
        dataDisc = pd.DataFrame()
        dataTx = pd.DataFrame()
        for rt in Item:
            NM=rt['NM']
            rt['ID'] = ID
            if 'Disc' in rt:
                data = json_normalize(rt, 'Disc', ['NM','ID'])
                dataDisc = dataDisc.append(data, sort=False)
            if 'TxLns' in rt:
                tx=rt['TxLns']
                tx['NM'] = NM
                tx['ID'] = ID
                if 'TX' in tx:
                    data = json_normalize(tx, 'TX', ['NM','ID'])
                    dataTx = dataTx.append(data, sort=False)
        dataDIS = pd.merge(dataItem, dataDisc, on=['NM','ID'],how='left')
        dataTX = pd.merge(dataDIS, dataTx, on=['NM','ID'],how='left')
        dataAddress = json_normalize(json_obj,'Address',['ID'])
        data_IT = pd.merge(dataMain, dataTX, on=['ID'])
        data_merge=pd.merge(data_IT,dataAddress, on=['ID'])
        data_final=data_final.append(data_merge,sort=False)
data_final=data_final.drop_duplicates(keep = 'first')
data_final.to_csv("data_merged.csv",index=False)

this is the output:

ID,Date,Total,QTY,Cust.CustID,Cust.Email,NM,CD_x,SRL,CD_y,Amount,TXNM,TXCD,FirstName,LastName,Address
02,2019-08-01,400,12,10,01@abc.com,0000000001,item_CD1,25,discount_CD1,2.0,000001-001,TX_CD1,firstname,lastname,address
02,2019-08-01,400,12,10,01@abc.com,0000000002,item_CD2,26,discount_CD2,4.0,000002-001,TX_CD2,firstname,lastname,address
02,2019-08-01,400,12,10,01@abc.com,0000000003,item_CD3,27,,,,,firstname,lastname,address

The code is working fine for now. By Better I mean:

Is it efficient in terms of time and space complexity? If this code has to process around 10K records in a file, is this the optimized solution?

Community
  • 1
  • 1
  • Possible duplicate of [How can I convert JSON to CSV?](https://stackoverflow.com/questions/1871524/how-can-i-convert-json-to-csv) – shuberman Aug 11 '19 at 10:34
  • Please specify your issue properly. When you say `better` , do you mean less lines of code or you want it to be efficient in terms of time and space complexity? – shuberman Aug 11 '19 at 10:35
  • Perhaps it is better to post this question on [Code Review](https://codereview.stackexchange.com/). Code Review is about improving existing, working code. – Theo Aug 11 '19 at 10:46

0 Answers0