-3

Sample JSON for testing: I loaded this into a dataframe using pandas: There are many keys but I only need Id and Events so I loaded this into a separate dataframe.

I need to iterate for each id(ex:abcd) and extract the events for that id. There are multiple id's

I need this to be flattened into a single table or data frame with the below columns so that I can load it into an RDBMS

Some keys don't have values ex: events[origin] etc.

Appreciate your help.

Columns:

id description time origin_name origin_code remarks destination_name destination_code invoice code

JSON:

[
   {
      "id":"abcd",
      "events":[
         {
            "description":"Liquidated",
            "time":"2020-12-18T12:00:00Z",
            "origin":{
               
            },
            "destination":{
               "name":" ",
               "code":"SAV"
            },
            "remarks":"L 99",
            "invoice":"pqr",
            "code":"LIQ"
         },
         {
            "description":"Customs Monthly Stmt Paid",
            "time":"2020-02-26T12:00:00Z",
            "origin":{
               "name":"abc, GA",
               "code":" "
            },
            "destination":{
               "name":"SAVANNAH, GA",
               "code":"SAV"
            },
            "remarks":"PM Stmt No 1720P0207Q",
            "invoice":"pqr",
            "code":"MSP"
         }
         ]
  },
  {
      "id":"xyz",
      "events":[
         {
            "description":"Liquidated",
            "time":"2020-12-18T12:00:00Z",
            "origin":{
               "name":"abc, GA",
               "code":" "
            },
            "destination":{
               
            },
            "remarks":"L 99",
            "invoice":" ",
            "code":"LIQ"
         },
         {
            "description":"Customs Monthly Stmt Paid",
            "time":"2020-02-26T12:00:00Z",
            "origin":{
               
            },
            "destination":{
               "name":"SAVANNAH, GA",
               "code":"SAV"
            },
            "remarks":"PM Stmt No 1720P0207Q",
            "invoice":" ",
            "code":"MSP"
         }
         ]
}
]

     

1 Answers1

0

One possible solution is to loop through events key then special handle origin and destination key:

import json
import numpy as np
import pandas as pd

with open('test.json') as f:
    data = json.load(f)

res = []

for d in data:
    for item in d['events']:
        temp_d = {}
        temp_d['id'] = d['id']

        for k, v in item.items():
            if k == 'origin' or k == 'destination':
                temp_d[f'{k}_name'] = v.get('name', np.nan)
                temp_d[f'{k}_code'] = v.get('code', np.nan)
            else:
                temp_d[k] = v

        res.append(temp_d)

df = pd.DataFrame(res)
print(df)

     id                description                  time origin_name origin_code destination_name destination_code                remarks invoice code
0  abcd                 Liquidated  2020-12-18T12:00:00Z     abc, GA                                           SAV                   L 99     pqr  LIQ
1  abcd  Customs Monthly Stmt Paid  2020-02-26T12:00:00Z         NaN         NaN     SAVANNAH, GA              SAV  PM Stmt No 1720P0207Q     pqr  MSP
2   xyz                 Liquidated  2020-12-18T12:00:00Z     abc, GA                          abc                                    L 99          LIQ
3   xyz  Customs Monthly Stmt Paid  2020-02-26T12:00:00Z         NaN         NaN     SAVANNAH, GA              SAV  PM Stmt No 1720P0207Q          MSP
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • Thanks but for some json files i don't have origin_name and origin_code ex: origin{ } – viraj patel Apr 25 '21 at 04:17
  • @virajpatel Example data? – Ynjxsjmh Apr 25 '21 at 04:18
  • Modified the sample json. Some keys have origin name and origin code same with destination_name and destination_code but for some keys these are empty ie they only have origin :{},destination:{}. I've modified the source json file – viraj patel Apr 25 '21 at 04:20
  • @virajpatel I read your change log. You still have key `origin` and `destination` right? Have you really read and run my code? – Ynjxsjmh Apr 25 '21 at 04:22
  • Yes i ran the code and it is failing with the below error: [SQL Server]Invalid column name 'origin_name'. (207) (SQLExecute); [42S22] [Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid column name 'origin_code'. (207); [42S22] [Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid column name 'destination_name'. (207); [42S22] [Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid column name 'destination_code'. (207); [42S22] [Microsoft][SQL Server Native Client 11.0][SQL Server]Statement(s) could not be prepared. (8180)") – viraj patel Apr 25 '21 at 04:27
  • @virajpatel What is `SQLExecute`? How do you run my code? `data` in my code should be a dict parsed from your json. – Ynjxsjmh Apr 25 '21 at 04:31
  • Thanks It worked. Appreciate your help. – viraj patel Apr 25 '21 at 04:38
  • @virajpatel The idea is usage of `dict.get(key)`, if dict doesn't have `key`, it will return a default value. See https://stackoverflow.com/questions/11041405. – Ynjxsjmh Apr 25 '21 at 04:41
  • Thanks @Ynjxsjmh. I did that. Appreciate your help. – viraj patel Apr 25 '21 at 13:32