Goal: For a dataframe with schema
id:string
Cold:string
Medium:string
Hot:string
IsNull:string
annual_sales_c:string
average_check_c:string
credit_rating_c:string
cuisine_c:string
dayparts_c:string
location_name_c:string
market_category_c:string
market_segment_list_c:string
menu_items_c:string
msa_name_c:string
name:string
number_of_employees_c:string
number_of_rooms_c:string
Months In Role:integer
Tenured Status:string
IsCustomer:integer
units_c:string
years_in_business_c:string
medium_interactions_c:string
hot_interactions_c:string
cold_interactions_c:string
is_null_interactions_c:string
I want to add a new column that is a JSON string of all keys and values for the columns. I have used the approach in this post PySpark - Convert to JSON row by row and related questions. My code
df = df.withColumn("JSON",func.to_json(func.struct([df[x] for x in small_df.columns])))
I am having one issue:
Issue: When any row has a null value for a column (and my data has many...) the Json string doesn't contain the key. I.e. if only 9 out of the 27 columns have values then the JSON string only has 9 keys... What I would like to do is maintain all keys but for the null values just pass an empty string ""
Any tips?