0

I'm trying to parse a json into a dataframe. And I one focus on the first key on the json (validations). The structure of the json is pretty standard, as the example below:

{
"validations": [
    {
        "id": "1111111-2222-3333-4444-555555555555",
        "created_at": "2020-02-19T14:35:58-03:00",
        "finished_at": "2020-02-19T14:36:01-03:00",
        "processing_status": "concluded",
        "receivable_id": "VAL-AAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE",
        "external_reference": "FFFFFFFF-GGGG-HHHH-IIII-JJJJJJJJJJJJ",
        "batch_id": "e2fb8d34-8c53-4910-b7a4-602ab6845855",
        "portfolio": {
            "id": "57a3e56a-347b-449c-8f1a-253baba90e7a",
            "nome": "COMPANY_NAME"
      }
    }],

"pages": {
    "per_page": 10,
    "page": 1
  }
}

I'm using the following code:

import json   as json
import pandas as pd
import os



print(os.getcwd()) ## point out the directory you're working on this cell


filename = r"file_path\file_name.json"

f = open(filename)
data1 = json.loads(f.read())

df = pd.json_normalize(data1)

data1.keys() 

## => returns: dict_keys(['validacoes', 'paginacao'])


res = dict((k, data1[k]) for k in ['validacoes'] 
                                    if k in data1) 

res.keys() 
## => returns dict_keys(['validacoes'])


df = pd.DataFrame(res, columns=['id', 'data_criacao', 'data_finalizacao', 'status_do_processamento', 'recebivel_id','referencia_externa', 'lote_id', 'veiculo'])  


df.head() 

## returns=> a dataframe with no values on the columns, as if they were empty from the json
| id | created_at | finished_at | processing_status | receivable_id | external_reference | batch_id | external_reference | portfolio |

So, I already checked the original file on a text editor and, yes, the json is properly mapped with values.

And the format is standardized throughout the file.

Any thoughts as to why the data from the json is being lost on the process?

Jimmar
  • 4,194
  • 2
  • 28
  • 43
FCdSP
  • 1
  • 2
    Does this answer your question? [JSON to pandas DataFrame](https://stackoverflow.com/questions/21104592/json-to-pandas-dataframe) – Jimmar Feb 26 '21 at 01:09
  • Do this after read the json : ```df = pd.DataFrame(data1["validations"]) ``` your data frame is ready! – Inputvector Feb 26 '21 at 01:19

1 Answers1

0

You need to flatten your JSON.

This post should help you out: Python flatten multilevel/nested JSON

Also, pandas has a simple json_normalize method you could use.

Cyrus
  • 613
  • 1
  • 6
  • 22