What would be the equivalent of this call in Spark version 2.2.1:
df.column_name.eqNullSafe(df2.column_2)
(df.column_name
is not callable. It works in 2.3.0 but in 2.2.1 I get the error: TypeError: 'Column' object is not callable
)
Here's an example for reproduction. I have a sample dataframe:
# +----+----+
# | id| var|
# +----+----+
# | 1| a|
# | 2|null|
# |null| b|
# +----+----+
I need to deconstruct it then do a null-safe equals on a column to compare and put it back together. This is the code that does that. (it can be pasted and ran as is, works in 2.3.0, reproduces the error in 2.2.1)
df = spark.createDataFrame(
[
('1', 'a'),
('2', None),
(None, 'b')
],
('id', 'var')
)
def get_condition(right, left):
return right.id.eqNullSafe(left.id_2)
right_df = df.select(df.columns[:1])
left_df = df.filter(df.var.isNotNull()).withColumnRenamed('id', 'id_2')
result = right_df.join(left_df, get_condition(right_df, left_df), how='left')
result.select('id', 'var').show()
I'd like to modify the get_condition method's call to use a callable version of the column to call eqNullSafe. (note, can't use pandas)