I have a pyspark dataframe with column "Student".
One entry of data is as follows:
{
"Student" : {
"m" : {
"name" : {"s" : "john"},
"score": {"s" : "165"}
}
}
}
I want to change the schema of this column, so that the entry looks as follows:
{
"Student" :
{
"m" :
{
"StudentDetails" :
{
"m" :
{
"name" : {"s" : "john"},
"score": {"s" : "165"}
}
}
}
}
}
The problem is that the Student field can also be null in the dataframe. So I want to retain the null values but change the schema of not null values. I have used a udf for the above process which works.
def Helper_ChangeSchema(row):
#null check
if row is None:
return None
#change schema
data = row.asDict(True)
return {"m":{"StudentDetails":data}}
but udf is a black box for spark. Is there any method to do the same using inbuilt spark functions or sql queries.