How can I parse a json to a pandas dataframe without losing values?

Question

I'm trying to parse a json into a dataframe. And I one focus on the first key on the json (validations). The structure of the json is pretty standard, as the example below:

{
"validations": [
    {
        "id": "1111111-2222-3333-4444-555555555555",
        "created_at": "2020-02-19T14:35:58-03:00",
        "finished_at": "2020-02-19T14:36:01-03:00",
        "processing_status": "concluded",
        "receivable_id": "VAL-AAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE",
        "external_reference": "FFFFFFFF-GGGG-HHHH-IIII-JJJJJJJJJJJJ",
        "batch_id": "e2fb8d34-8c53-4910-b7a4-602ab6845855",
        "portfolio": {
            "id": "57a3e56a-347b-449c-8f1a-253baba90e7a",
            "nome": "COMPANY_NAME"
      }
    }],

"pages": {
    "per_page": 10,
    "page": 1
  }
}

I'm using the following code:

import json   as json
import pandas as pd
import os



print(os.getcwd()) ## point out the directory you're working on this cell


filename = r"file_path\file_name.json"

f = open(filename)
data1 = json.loads(f.read())

df = pd.json_normalize(data1)

data1.keys() 

## => returns: dict_keys(['validacoes', 'paginacao'])


res = dict((k, data1[k]) for k in ['validacoes'] 
                                    if k in data1) 

res.keys() 
## => returns dict_keys(['validacoes'])


df = pd.DataFrame(res, columns=['id', 'data_criacao', 'data_finalizacao', 'status_do_processamento', 'recebivel_id','referencia_externa', 'lote_id', 'veiculo'])  


df.head() 

## returns=> a dataframe with no values on the columns, as if they were empty from the json

| id | created_at | finished_at | processing_status | receivable_id | external_reference | batch_id | external_reference | portfolio |

So, I already checked the original file on a text editor and, yes, the json is properly mapped with values.

And the format is standardized throughout the file.

Any thoughts as to why the data from the json is being lost on the process?

Does this answer your question? [JSON to pandas DataFrame](https://stackoverflow.com/questions/21104592/json-to-pandas-dataframe) — Jimmar, Feb 26 '21 at 01:09
Do this after read the json : ```df = pd.DataFrame(data1["validations"]) ``` your data frame is ready! — Inputvector, Feb 26 '21 at 01:19

score 0 · Answer 1 · answered Feb 26 '21 at 01:09

0

You need to flatten your JSON.

This post should help you out: Python flatten multilevel/nested JSON

Also, pandas has a simple json_normalize method you could use.

answered Feb 26 '21 at 01:09

Cyrus

613
1
6
22

How can I parse a json to a pandas dataframe without losing values?

1 Answers1