5

I've a requirement to convert CSV Data File to Nested JSON in Python for an application. My below current Python code is working fine for 1 Customer/Accounts document but somehow not able to create json dump for all the customers in the CSV file.

I am providing the Python code below which will give you some insight on what I am trying to achieve. Please let me know if any existing resolutions on this.

Sample Python CODE:

import pandas as pd
from itertools import groupby 
from collections import OrderedDict
import json    

df = pd.read_csv('cust.csv', dtype={
        "ClientID" : str,
        "ClientName" : str,
        "AcctID" : str,
        "AcctNbr" : str,
        "AcctTyp" : str
    })

results = []

for (ClientID, ClientName), bag in df.groupby(["ClientID", "ClientName"]):
contents_df = bag.drop(["ClientID", "ClientName"], axis=1)
subset = [OrderedDict(row) for i,row in contents_df.iterrows()]
results.append(OrderedDict([("ClientID", ClientID),("ClientName", ClientName),("subset", subset)]))

print json.dumps(results[0], indent=4)

with open('ExpectedJsonFile.json', 'w') as outfile:
outfile.write(json.dumps(results[0], indent=4))

Sample INPUT CSV:

ClientID,ClientName,AcctID,AcctNbr,AcctTyp
----------------------------------------------------------
00001,John George,812001,812001095,DDA
00001,John George,813002,813002096,SAV
00001,John George,814003,814003097,AFS
00024,Richard Polado,512987,512987085,ML
00024,Richard Polado,512983,512983086,IL
00345,John Cruze,1230,123001567,SAV
00345,John Cruze,5145,514502096,CD
00345,John Cruze,7890,7890033527,SGD

Desired Output JSON:

{  
   "clientId":00001,
   "ClientName":"John George",
   "subset":[  
      {  
         "AcctID":812001,
         "AcctNbr":"812001095",
         "AcctTyp":"DDA",
      },
      {  
         "AcctID":813002,
         "AcctNbr":"813002096",
         "AcctTyp":"SAV",
      },
      {  
         "AcctID":814003,
         "AcctNbr":"814003097",
         "AcctTyp":"AFS",
      }
   ]
},
{  
   "clientId":00024,
   "ClientName":"Richard Polado",
   "subset":[  
      {  
         "AcctID":512987,
         "AcctNbr":"512987085",
         "AcctTyp":"ML",
      },
      {  
         "AcctID":512983,
         "AcctNbr":"512983086",
         "AcctTyp":"IL",
      }
   ]
}

and these documents should continue creating for other thousands of clients.

halfer
  • 19,824
  • 17
  • 99
  • 186
ShivKumar
  • 53
  • 1
  • 3
  • What do you mean by "somehow not able to" - what problem are you having specifically? What does it not do? – halfer Jun 29 '18 at 06:40
  • It only creates one document for the first ClientID only. { "clientId":00001, "ClientName":"John George", "subset":[ { "AcctID":812001, "AcctNbr":"812001095", "AcctTyp":"DDA", }, { "AcctID":813002, "AcctNbr":"813002096", "AcctTyp":"SAV", }, { "AcctID":814003, "AcctNbr":"814003097", "AcctTyp":"AFS", } ] } – ShivKumar Jun 29 '18 at 06:43
  • Sounds like a debugging problem, then. Write some code to detect if it gets to the file writing part on the second pass of the loop. Does it not get there? Or, maybe your file writing fails because it is using the same filename? – halfer Jun 29 '18 at 06:44
  • I'm not familiar with Python, but don't loops need indentation? You do not seem to have any in this case. – halfer Jun 29 '18 at 06:45
  • @ShivKumar Does my solution work? – ababuji Jun 29 '18 at 07:25
  • @Abhishek, Thanks so much, Perfectly bringing the data as desired, can't expect better than this ... You are the Best.... Let me now try to import this json to mongoDB, I hope it works !! – ShivKumar Jun 29 '18 at 07:57
  • @Abhishek, I am able to import it into MongoDB as well. Thanks again !! – ShivKumar Jun 29 '18 at 09:23

2 Answers2

2

The solution groups by every 'ClientID','ClientName' pair

Your DataFrame

df = pd.DataFrame([['00001','John George','812001','812001095','DDA'],
['00001','John George','813002','813002096','SAV'],
['00001','John George','814003','814003097','AFS'],
['00024','Richard Polado','512987','512987085','ML'],
['00024','Richard Polado','512983','512983086','IL'],
['00345','John Cruze','1230','123001567','SAV'],
['00345','John Cruze','5145','514502096','CD'],
['00345','John Cruze','7890','7890033527','SGD']])

df.columns = ['ClientID','ClientName','AcctID','AcctNbr','AcctTyp'] 

and now

finalList = []
finalDict = {}
grouped = df.groupby(['ClientID', 'ClientName'])
for key, value in grouped:


    dictionary = {}

    j = grouped.get_group(key).reset_index(drop=True)
    dictionary['ClientID'] = j.at[0, 'ClientID']
    dictionary['ClientName'] = j.at[0, 'ClientName']


    dictList = []
    anotherDict = {}
    for i in j.index:

        anotherDict['AcctID'] = j.at[i, 'AcctID']
        anotherDict['AcctNbr'] = j.at[i, 'AcctNbr']
        anotherDict['AcctTyp'] = j.at[i, 'AcctTyp']

        dictList.append(anotherDict)

    dictionary['subset'] = dictList


    finalList.append(dictionary)

import json
json.dumps(finalList)

gives:

'[
   {"ClientID": "00001", 
    "ClientName": "John George", 
    "subset": 
            [{"AcctID": "814003", 
              "AcctNbr": "814003097", 
              "AcctTyp": "AFS"}, 

             {"AcctID": "814003", 
              "AcctNbr": "814003097", 
              "AcctTyp": "AFS"}, 

             {"AcctID": "814003", 
              "AcctNbr": "814003097", 
              "AcctTyp": "AFS"}]

   }, 

  {
   "ClientID": "00024", 
   "ClientName": "Richard Polado", 
   "subset": 
            [{"AcctID": "512983", 
              "AcctNbr": "512983086", 
              "AcctTyp": "IL"}, 

             {"AcctID": "512983", 
              "AcctNbr": "512983086", 
              "AcctTyp": "IL"}]
   }, 

  {
   "ClientID": "00345", 
   "ClientName": "John Cruze", 
   "subset": 
            [{"AcctID": "7890", 
              "AcctNbr": "7890033527", 
              "AcctTyp": "SGD"}, 

             {"AcctID": "7890", 
              "AcctNbr": "7890033527", 
              "AcctTyp": "SGD"}, 

             {"AcctID": "7890", 
              "AcctNbr": "7890033527", 
              "AcctTyp": "SGD"}]
   }

]'

Is this how you seem to want it?

ababuji
  • 1,683
  • 2
  • 14
  • 39
  • Hi Abhishek, Just a correction, I looked now with more data and actually its not working as expected ... when more records inner dictionary always brings first value – ShivKumar Jul 03 '18 at 16:56
  • Hi @imperialgendarme, it works for me... But can you help me I had a bit of doubt? What if I want another nest inside the subset with another name?? How can I insert my info inside the subset? Thanks in advance for your help!!! – Disha Purohit Jul 07 '20 at 06:20
  • [Comment by Mubeen Ghafoor](https://stackoverflow.com/a/65501128/2745495): The `another_dict = { }` should be inside the `for` loop. – Gino Mempin Dec 30 '20 at 04:00
2

Use dictList.append(anotherDict.copy()) otherwise you will get the same dict object in the list.

More details in this question: Create List of Dictionary Python

chb
  • 1,727
  • 7
  • 25
  • 47
Prasad Sawant
  • 205
  • 1
  • 15