I have some json data with an array that can have zero or more elements. Below is the data.When I explode the array, the row with zero elements is getting dropped. In this case name:Andy is getting dropped.
>>> d1 = [{"name":"Michael", "schools":[{"sname":"stanford", "year":2010}, {"sname":"berkeley", "year":2012}]},{"name":"Andy","schools":[]}]
>>> df1= sqlContext.createDataFrame(d1)
>>> df2 = df1.withColumn('school_details', func.explode(df1.schools))
>>> df3 = df2.select(df2.name, df2.school_details.sname,df2.school_details.year)
>>> df3.show()
+-------+---------------------+--------------------+
| name|school_details[sname]|school_details[year]|
+-------+---------------------+--------------------+
|Michael| stanford| 2010|
|Michael| berkeley| 2012|
+-------+---------------------+--------------------+
How can I get all the records as below.
Expected Results
+-------+---------------------+--------------------+
| name|school_details[sname]|school_details[year]|
+-------+---------------------+--------------------+
|Michael| stanford| 2010|
|Michael| berkeley| 2012|
|Andy | null | null|
+-------+---------------------+--------------------+