Building nested Json from dataframe using pyspark

Question

I have a dataframe and build a nested json object from this dataframe to represent the hieraical data, i am stuck where the json sub column is aded but its comming as string not as json.

**Code:**

from pyspark.sql.functions import *

#sample data
df=spark.createDataFrame([('1234567','123 Main St','10SjtT','idk@gmail.com','ecom','direct')],['cust_id','address','store_id','email','sales_channel','category'])

i want to represent this dataframe into this format

{
    "store_id": "10SjtT",
    "category": "direct",
    "sales_channel": "ecom",
    "email": "idk@gmail.com",
    "c_email": {
        "category": "direct",
        "email": "idk@gmail.com"
    }
}

i trid to add column but my sample code adds the nested json as a string with quotations

{
"store_id":"10SjtT","category":"direct","sales_channel":"ecom" 
,"c_email":"{\"category\":\"direct\",\"email\":\"idk@gmail.com\"}"

}

**Code used to build this **

dff = df.select("cust_id","address",to_json(struct("store_id","category","sales_channel","email",to_json(struct( "category" ,"email")).alias("c_email"))).alias("metadata")) 

dff.select("metadata").show(10,False)

Please let me know if anyone faced the similar issue and able to build nested json and carrying the json format across.

Thanks in advance.

Manoj.

score 0 · Answer 1 · answered Sep 02 '23 at 03:46

Remove the internal nested to_json() call.

from pyspark.sql.functions import *

#sample data
df=sqlContext.createDataFrame([('1234567','123 Main St','10SjtT','idk@gmail.com','ecom','direct')],['cust_id','address','store_id','email','sales_channel','category'])

dff = df.select("cust_id","address",to_json(struct("store_id","category","sales_channel","email",struct( "category" ,"email").alias("c_email"))).alias("metadata"))

dff.select("metadata").show(10,False)

Output:

+------------------------------------------------------------------------------------------------------------------------------------------------+
|metadata                                                                                                                                        |
+------------------------------------------------------------------------------------------------------------------------------------------------+
|{"store_id":"10SjtT","category":"direct","sales_channel":"ecom","email":"idk@gmail.com","c_email":{"category":"direct","email":"idk@gmail.com"}}|
+------------------------------------------------------------------------------------------------------------------------------------------------+

Building nested Json from dataframe using pyspark

1 Answers1