Spark SQL not recognizing null values after split

Question

I have similar data and issue to the questions asked here: Spark sql how to explode without losing null values

I have used the solution proposed for Spark <=2.1 and indeed the null values appaear as literals in my data after the split:

df.withColumn("likes", explode(
  when(col("likes").isNotNull, col("likes"))
    // If null explode an array<string> with a single null
    .otherwise(array(lit(null).cast("string")))))

The issue is that after that I need to check if there are null values in that column and take an action in that case. Wehn I try to run my code, the nulls inserted as literals as recognized as string instead of null values.

So this code below will always return 0 even if the row has a null in that column:

df.withColumn("likes", f.when(col('likes').isNotNull(), 0).otherwise(2)).show()

+--------+------+
|likes   |origin|
+--------+------+
|    CARS|     0|
|    CARS|     0|
|    null|     0|
|    null|     0|

I use cloudera pyspark

score 1 · Accepted Answer · answered Oct 16 '18 at 10:53

1

You could hack this, by using an udf:

val empty = udf(() => null: String)

df.withColumn("likes", explode(
  when(col("likes").isNotNull, col("likes"))
    // If null explode an array<string> with a single null
    .otherwise(array(empty()))))

answered Oct 16 '18 at 10:53

user10512437

26
1

Hi, thanks. I just copy-paste your function and I get an error: File "", line 1:undefined val empty = udf(() => null: String) ^ SyntaxError: invalid syntax. Perhaps it does not work with all the versions? I am using pyspark – DroppingOff Oct 16 '18 at 12:25
Hi, I don't know why udf does not work for me but I found another way and answer here. I will mark your answer as good anyway so that it can help others. Thanks – DroppingOff Oct 16 '18 at 13:20

score 0 · Answer 2 · answered Oct 16 '18 at 13:20

0

I actually found a way. In the otherwise have to write this:

.otherwise(array(lit(None).cast("string")))))

answered Oct 16 '18 at 13:20

DroppingOff

331
3
17

Spark SQL not recognizing null values after split

2 Answers2