0

I need to perform a left join in Spark 2.4.1 that keeps the Null values.

While researching I found this solution: Including null values in an Apache Spark Join which seems to be it. Everytime I call eqNullSafe however I get the error "'Column' object is not callable"

I have tried the example provided under the link:

numbers_df = sc.parallelize([
    ("123", ), ("456", ), (None, ), ("", )
]).toDF(["numbers"])

letters_df = sc.parallelize([
    ("123", "abc"), ("456", "def"), (None, "zzz"), ("", "hhh")
]).toDF(["numbers", "letters"])

numbers_df.join(letters_df, numbers_df.numbers.eqNullSafe(letters_df.numbers))

Any idea why this code would raise these issues? I am using a SageMaker notebook on a AWS Glue developer endpoint. Might it be due to missing import?

These are the imports I do aside from those specific to glue:

from pyspark.sql import *
from pyspark.sql import functions as F
Daniel
  • 394
  • 6
  • 15
  • When executing your code, I got the result you are expecting. No error on my side. Generally, this error means you added `()` parenthesis to an object that is not a function. check that maybe – Steven Apr 30 '19 at 08:09
  • Thanks for the reply. Which imports did you use? Which spark version? – Daniel Apr 30 '19 at 09:22
  • No imports. You dont need any and dont use any in your code. Spark version 2.3. – Steven Apr 30 '19 at 09:33
  • Doesn't work. I don't get it. Has there been a change between 2.3 and 2.4.1?? The function is referenced in the 2.4.1 documentation though... – Daniel Apr 30 '19 at 10:50
  • you have the error with both your small example and your real dataset ? – Steven Apr 30 '19 at 12:00
  • yes the error occurs independent from the data I use – Daniel Apr 30 '19 at 12:06
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/192634/discussion-between-steven-and-daniel). – Steven Apr 30 '19 at 12:19

0 Answers0