so I am trying to conditionally apply an udf some_function() to column b1, based on the value in a1. (otherwise don't apply). Using pyspark.sql.functions.when(condition, value) and a simple udf
some_function = udf(lambda x: x.translate(...))
df = df.withColumn('c1',when(df.a1 == 1, some_function(df.b1)).otherwise(df.b1))
With this example data:
| a1| b1|
---------------
| 1|'text'|
| 2| null|
I am seeing that some_function() is always evaluated (i.e. the udf calls translate() on null and crashes), regardless of condition and applied if condition is true. To clarify, this is not about udfs handling null correctly, but when(...) always executing value, if value is an udf.
Is this behaviour intended? If so, how can I apply a method conditionally so it doesn't get executed if condition is not met?