I'm trying to parse a json into a dataframe. And I one focus on the first key on the json (validations). The structure of the json is pretty standard, as the example below:
{
"validations": [
{
"id": "1111111-2222-3333-4444-555555555555",
"created_at": "2020-02-19T14:35:58-03:00",
"finished_at": "2020-02-19T14:36:01-03:00",
"processing_status": "concluded",
"receivable_id": "VAL-AAAAAAA-BBBB-CCCC-DDDD-EEEEEEEEEEEE",
"external_reference": "FFFFFFFF-GGGG-HHHH-IIII-JJJJJJJJJJJJ",
"batch_id": "e2fb8d34-8c53-4910-b7a4-602ab6845855",
"portfolio": {
"id": "57a3e56a-347b-449c-8f1a-253baba90e7a",
"nome": "COMPANY_NAME"
}
}],
"pages": {
"per_page": 10,
"page": 1
}
}
I'm using the following code:
import json as json
import pandas as pd
import os
print(os.getcwd()) ## point out the directory you're working on this cell
filename = r"file_path\file_name.json"
f = open(filename)
data1 = json.loads(f.read())
df = pd.json_normalize(data1)
data1.keys()
## => returns: dict_keys(['validacoes', 'paginacao'])
res = dict((k, data1[k]) for k in ['validacoes']
if k in data1)
res.keys()
## => returns dict_keys(['validacoes'])
df = pd.DataFrame(res, columns=['id', 'data_criacao', 'data_finalizacao', 'status_do_processamento', 'recebivel_id','referencia_externa', 'lote_id', 'veiculo'])
df.head()
## returns=> a dataframe with no values on the columns, as if they were empty from the json
| id | created_at | finished_at | processing_status | receivable_id | external_reference | batch_id | external_reference | portfolio |
So, I already checked the original file on a text editor and, yes, the json is properly mapped with values.
And the format is standardized throughout the file.
Any thoughts as to why the data from the json is being lost on the process?