I have the function get_alerts
that returns two String fields. For simplicity let's consider the fixed output of this function: return "xxx", "yyy"
(this is to avoid posting the code of get_alerts
).
Then I have the UDF
function and I use when-otherwise
expression in Spark DataFrame df
.
import pyspark.sql.functions as func
get_alerts_udf = func.udf(lambda c1, c2, c3:
get_alerts(c1, c2, c3),
StructType(
[
StructField('probability', StringType()),
StructField('level', StringType())
]
)
)
df = df \
.withColumn("val", func.when(func.col("is_inside") == 1, get_alerts_udf(1,2,3))
.otherwise(["0","0"])
)
The problem is that otherwise(["0","0"])
does not correspond to the type of the output of the function get_alerts_udf
.
How do I define otherwise(["0","0"])
to correspond to:
StructType(
[
StructField('probability', StringType()),
StructField('level', StringType())
]
)
UPDATE:
I still get the error:
pyspark.sql.utils.AnalysisException: u"cannot resolve 'CASE WHEN (is_inside` = 1) THEN <lambda>(c1, c2, c3) ELSE named_struct('col1', '0', 'col2', '0') END' due to data type mismatch: THEN and ELSE expressions should all be same type or coercible to a common type;;`.
According to the recommendation from a duplicate post, I used otherwise(func.struct(func.lit("xxx"),func.lit("yyy")))
.