My input JSON file is
{
"Name": "Test",
"Mobile": 12345678,
"Boolean": true,
"Pets": ["Dog", "cat"],
"Address": {
"Permanent address": "USA",
"current Address": "AU"
}
}
The requirement is to convert the above multi-level JSON to dataframe using pyspark.
I tried using the code
path_to_input = "/FileStore/tables/sample_json_file2-6c20f.json"
df = spark.read.json(sc.wholeTextFiles(path_to_input).values())
df.show()
I got the output as
+---------+-------+--------+----+----------+
| Address|Boolean| Mobile|Name| Pets|
+---------+-------+--------+----+----------+
|[USA, AU]| true|12345678|Test|[Dog, cat]|
+---------+-------+--------+----+----------+
In the address and pets fields i'm getting two values in the same columns. It shouldn't be like an array. I should get like Address_Permanent address as USA, Address_current Address as AU.