eqNullSafe function raises error in spark 2.4.1

Question

I need to perform a left join in Spark 2.4.1 that keeps the Null values.

While researching I found this solution: Including null values in an Apache Spark Join which seems to be it. Everytime I call eqNullSafe however I get the error "'Column' object is not callable"

I have tried the example provided under the link:

numbers_df = sc.parallelize([
    ("123", ), ("456", ), (None, ), ("", )
]).toDF(["numbers"])

letters_df = sc.parallelize([
    ("123", "abc"), ("456", "def"), (None, "zzz"), ("", "hhh")
]).toDF(["numbers", "letters"])

numbers_df.join(letters_df, numbers_df.numbers.eqNullSafe(letters_df.numbers))

Any idea why this code would raise these issues? I am using a SageMaker notebook on a AWS Glue developer endpoint. Might it be due to missing import?

These are the imports I do aside from those specific to glue:

from pyspark.sql import *
from pyspark.sql import functions as F

When executing your code, I got the result you are expecting. No error on my side. Generally, this error means you added `()` parenthesis to an object that is not a function. check that maybe — Steven, Apr 30 '19 at 08:09
Thanks for the reply. Which imports did you use? Which spark version? — Daniel, Apr 30 '19 at 09:22
No imports. You dont need any and dont use any in your code. Spark version 2.3. — Steven, Apr 30 '19 at 09:33
Doesn't work. I don't get it. Has there been a change between 2.3 and 2.4.1?? The function is referenced in the 2.4.1 documentation though... — Daniel, Apr 30 '19 at 10:50
you have the error with both your small example and your real dataset ? — Steven, Apr 30 '19 at 12:00
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/192634/discussion-between-steven-and-daniel). — Steven, Apr 30 '19 at 12:19

eqNullSafe function raises error in spark 2.4.1

0 Answers0