I have similar data and issue to the questions asked here: Spark sql how to explode without losing null values
I have used the solution proposed for Spark <=2.1 and indeed the null values appaear as literals in my data after the split:
df.withColumn("likes", explode(
when(col("likes").isNotNull, col("likes"))
// If null explode an array<string> with a single null
.otherwise(array(lit(null).cast("string")))))
The issue is that after that I need to check if there are null values in that column and take an action in that case. Wehn I try to run my code, the nulls inserted as literals as recognized as string instead of null values.
So this code below will always return 0 even if the row has a null in that column:
df.withColumn("likes", f.when(col('likes').isNotNull(), 0).otherwise(2)).show()
+--------+------+
|likes |origin|
+--------+------+
| CARS| 0|
| CARS| 0|
| null| 0|
| null| 0|
I use cloudera pyspark