I have a dataframe and build a nested json object from this dataframe to represent the hieraical data, i am stuck where the json sub column is aded but its comming as string not as json.
**Code:**
from pyspark.sql.functions import *
#sample data
df=spark.createDataFrame([('1234567','123 Main St','10SjtT','idk@gmail.com','ecom','direct')],['cust_id','address','store_id','email','sales_channel','category'])
i want to represent this dataframe into this format
{
"store_id": "10SjtT",
"category": "direct",
"sales_channel": "ecom",
"email": "idk@gmail.com",
"c_email": {
"category": "direct",
"email": "idk@gmail.com"
}
}
i trid to add column but my sample code adds the nested json as a string with quotations
{
"store_id":"10SjtT","category":"direct","sales_channel":"ecom"
,"c_email":"{\"category\":\"direct\",\"email\":\"idk@gmail.com\"}"
}
**Code used to build this **
dff = df.select("cust_id","address",to_json(struct("store_id","category","sales_channel","email",to_json(struct( "category" ,"email")).alias("c_email"))).alias("metadata"))
dff.select("metadata").show(10,False)
Please let me know if anyone faced the similar issue and able to build nested json and carrying the json format across.
Thanks in advance.
Manoj.