I have the 4 level nested JSON file below, that I would like to normalize to a one level nesting:
Input file is like this:
{
"@index": "40",
"row": [
{
"column": [
{
"text": {
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "85.10",
"@y": "663.12",
"@width": "250.01",
"@height": "12.00",
"#text": "text 1"
}
}
]
},
{
"column": [
{
"text": {
"@fontName": "Times New Roman",
"@fontSize": "8.0",
"@x": "121.10",
"@y": "675.36",
"@width": "348.98",
"@height": "8.04",
"#text": "text 2"
}
},
{
"text": {
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "473.30",
"@y": "676.92",
"@width": "42.47",
"@height": "12.00",
"#text": "text 3"
}
}
]
},
{
"column": [
{
"text": {
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "85.10",
"@y": "690.72",
"@width": "433.61",
"@height": "12.00",
"#text": "text 4"
}
}
]
}
]
}
Desired output is like this:
{
"@index": "40",
"row": [
{
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "85.10",
"@y": "663.12",
"@width": "250.01",
"@height": "12.00",
"#text": "Text 1"
},
{
"@fontName": "Times New Roman",
"@fontSize": "8.0",
"@x": "121.10",
"@y": "675.36",
"@width": "348.98",
"@height": "8.04",
"#text": "Text 2"
},
{
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "473.30",
"@y": "676.92",
"@width": "42.47",
"@height": "12.00",
"#text": "Text 3"
},
{
"@fontName": "Times New Roman",
"@fontSize": "12.0",
"@x": "85.10",
"@y": "690.72",
"@width": "433.61",
"@height": "12.00",
"#text": "Text 4"
}
]
}
The code I have so far is this using pandas is below, but I don´t know how to continue to normalize to one level.
import json
import pandas as pd
from pandas.io.json import json_normalize #package for flattening json in pandas df
#load json object
with open('D:\Files\JSON\4Level.json') as f:
d = json.load(f)
nycphil = json_normalize(d['row'])
print (nycphil.head(4))
This is the current output tabulated, where shows that column
is a nested element:
column
0 [{'text': {'@fontName': 'Times New Roman', '@f...
1 [{'text': {'@fontName': 'Times New Roman', '@f...
2 [{'text': {'@fontName': 'Times New Roman', '@f...
The print with one level nesting would be:
text.#text text.@fontName text.@fontSize ... text.@width text.@x text.@y
0 Text 1 Times New Roman 12.0 ... 250.01 85.10 663.12
1 Text 2 Times New Roman 8.0 ... 348.98 121.10 675.36
2 Text 3 Times New Roman 12.0 ... 42.47 473.30 676.92
3 Text 4 Times New Roman 12.0 ... 433.61 85.10 690.72
The Input/Output comparison is like this:
Maybe someone could help me with this. Thanks for any help.
UPDATE
In order to make a small sample in first sample input I showed, I removed some elements that seems to be are needed in your scripts to work. So now I show exactly the same structure as real file but with this input your scripts don't work. I think they need a little tweak but I've been trying and I don't know how to change them to get the same output with this new input. Maybe you can help me and sorry for not show the correct input from beginning.
{
"document":{
"page":[
{
"@index":"0",
"image":{
"@data":"ABC",
"@format":"png",
"@height":"620.00",
"@type":"base64encoded",
"@width":"450.00",
"@x":"85.00",
"@y":"85.00"
}
},
{
"@index":"1",
"row":[
{
"column":[
{
"text":""
},
{
"text":{
"#text":"Text1",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"71.04",
"@x":"121.10",
"@y":"83.42"
}
}
]
},
{
"column":[
{
"text":""
},
{
"text":{
"#text":"Text2",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"101.07",
"@x":"121.10",
"@y":"124.82"
}
}
]
}
]
},
{
"@index":"2",
"row":[
{
"column":{
"text":{
"#text":"Text3",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"363.44",
"@x":"85.10",
"@y":"69.62"
}
}
},
{
"column":{
"text":{
"#text":"Text4",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"382.36",
"@x":"85.10",
"@y":"83.42"
}
}
},
{
"column":{
"text":{
"#text":"Text5",
"@fontName":"Arial",
"@fontSize":"12.0",
"@height":"12.00",
"@width":"435.05",
"@x":"85.10",
"@y":"97.22"
}
}
}
]
},
{
"@index":"3"
}
]
}
}