0

structured nested JSON file that I need to use as a data frame(or CSV) to extract insight from the data. Below is the sample of 1 part of the JSON.. i have more then 1million records with different details n feature.. what would be the right way to Parse this as a structure Table using Python

    { 
    "CRD" : {
        "FG" : "ZVX", 
        "ZPN" : "04W05BA2A", 
        "MATCH" : "exact", 
        "COUNT" : 4, 
        "SUMMARY" : {
            "ID" : "33772", 
            "PATHID" : "10417"
        }, 
        "DETAILS" : {
            "PARADATA" : {
                "FEATURES" : [
                    {
                        "FEATURENAME" : "Laptop Value", 
                        "FEATUREVALUE" : "0.9 F", 
                        "FEATUREUNIT" : "", 
                        "FEATUREID" : "22", 
                        "FEATUREVALUEDETAILS" : {
                            "VALUE" : "0.8", 
                            "SIGN" : "", 
                            "UNIT" : "F", 
                            "MULTIPLIER" : "p", 
                            "MULTIPLIERVALUE" : "9.0E-12"
                        }
                    }, 
                    {
                        "FEATURENAME" : "Product weight", 
                        "FEATUREVALUE" : "", 
                        "FEATUREUNIT" : "mm", 
                        "FEATUREID" : "1372"
                    }, 
                    {
                        "FEATURENAME" : "Variable", 
                        "FEATUREVALUE" : "Fixed", 
                        "FEATUREUNIT" : "", 
                        "FEATUREID" : "138", 
                        "FEATUREVALUEDETAILS" : {
                            "VALUE" : "Fixed", 
                            "SIGN" : "", 
                            "UNIT" : "", 
                            "MULTIPLIER" : "", 
                            "MULTIPLIERVALUE" : "1.0"
                        }
                    }
                ]
            }
        }
    }
}
Shiv948
  • 469
  • 5
  • 13

1 Answers1

1

you can use json_normalize:

import json
your_json=json.loads(your_json) #convert string to dict

df = pd.json_normalize(your_json).explode('CRD.DETAILS.PARADATA.FEATURES').reset_index(drop=True)
df = df.join(pd.json_normalize(df.pop('CRD.DETAILS.PARADATA.FEATURES'))).drop_duplicates()
'''
|    | CRD.FG   | CRD.ZPN   | CRD.MATCH   |   CRD.COUNT |   CRD.SUMMARY.ID |   CRD.SUMMARY.PATHID | FEATURENAME    | FEATUREVALUE   | FEATUREUNIT   |   FEATUREID | FEATUREVALUEDETAILS.VALUE   | FEATUREVALUEDETAILS.SIGN   | FEATUREVALUEDETAILS.UNIT   | FEATUREVALUEDETAILS.MULTIPLIER   |   FEATUREVALUEDETAILS.MULTIPLIERVALUE |
|---:|:---------|:----------|:------------|------------:|-----------------:|---------------------:|:---------------|:---------------|:--------------|------------:|:----------------------------|:---------------------------|:---------------------------|:---------------------------------|--------------------------------------:|
|  0 | ZVX      | 04W05BA2A | exact       |           4 |            33772 |                10417 | Laptop Value   | 0.9 F          |               |          22 | 0.8                         |                            | F                          | p                                |                                 9e-12 |
|  1 | ZVX      | 04W05BA2A | exact       |           4 |            33772 |                10417 | Product weight |                | mm            |        1372 | nan                         | nan                        | nan                        | nan                              |                               nan     |
|  2 | ZVX      | 04W05BA2A | exact       |           4 |            33772 |                10417 | Variable       | Fixed          |               |         138 | Fixed                       |                            |                            |                                  |                                 1     |
'''
Bushmaster
  • 4,196
  • 3
  • 8
  • 28
  • I am getting error if i try to run it # check to see if a simple recursive function is possible to NotImplementedError: – Shiv948 Nov 22 '22 at 06:54
  • probably your json's type is string. I edited my answer. – Bushmaster Nov 22 '22 at 07:00
  • Still getting decodeError , am i missing anything here? raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) – Shiv948 Nov 22 '22 at 07:07
  • 1
    i will check this one: https://stackoverflow.com/a/58647394/10362801 – Shiv948 Nov 22 '22 at 07:10
  • The above link solved my issue – Shiv948 Nov 22 '22 at 07:12
  • when i run on the complete Dump of 1M records - (that is multiple dictionary) it says json.decoder.JSONDecodeError: Extra data: line 328 column 1 (char 13876) – Shiv948 Nov 22 '22 at 07:22
  • are you getting this error under json_normalize or json.loads ? – Bushmaster Nov 22 '22 at 07:25
  • error at : json.loads – Shiv948 Nov 22 '22 at 07:28
  • to fix 'json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)' i used: with open(json_file_path, 'r') as j: your_json = json.loads(j.read()) – Shiv948 Nov 22 '22 at 07:30
  • i have no idea about this error. This question may help: https://stackoverflow.com/questions/21058935/python-json-loads-shows-valueerror-extra-data – Bushmaster Nov 22 '22 at 07:35
  • i am able to run it for a single dictionary {} which I mentioned in my question, but when run it for stack of dictionary it fails – Shiv948 Nov 22 '22 at 08:23