1

I have some json data with an array that can have zero or more elements. Below is the data.When I explode the array, the row with zero elements is getting dropped. In this case name:Andy is getting dropped.

>>> d1 = [{"name":"Michael", "schools":[{"sname":"stanford", "year":2010}, {"sname":"berkeley", "year":2012}]},{"name":"Andy","schools":[]}]
>>> df1= sqlContext.createDataFrame(d1)
>>> df2 = df1.withColumn('school_details', func.explode(df1.schools))
>>> df3 = df2.select(df2.name, df2.school_details.sname,df2.school_details.year)
>>> df3.show()
+-------+---------------------+--------------------+
|   name|school_details[sname]|school_details[year]|
+-------+---------------------+--------------------+
|Michael|             stanford|                2010|
|Michael|             berkeley|                2012|
+-------+---------------------+--------------------+

How can I get all the records as below.

Expected Results

+-------+---------------------+--------------------+
|   name|school_details[sname]|school_details[year]|
+-------+---------------------+--------------------+
|Michael|             stanford|                2010|
|Michael|             berkeley|                2012|
|Andy   |             null    |                null|
+-------+---------------------+--------------------+
Lijju Mathew
  • 1,911
  • 6
  • 20
  • 26

0 Answers0