0

I have the function get_alerts that returns two String fields. For simplicity let's consider the fixed output of this function: return "xxx", "yyy" (this is to avoid posting the code of get_alerts).

Then I have the UDF function and I use when-otherwise expression in Spark DataFrame df.

import pyspark.sql.functions as func

get_alerts_udf = func.udf(lambda c1, c2, c3:
       get_alerts(c1, c2, c3),
       StructType(
                    [
                        StructField('probability', StringType()),
                        StructField('level', StringType())
                    ]
       )
    )

df = df \
    .withColumn("val", func.when(func.col("is_inside") == 1, get_alerts_udf(1,2,3))
                            .otherwise(["0","0"])
                )

The problem is that otherwise(["0","0"]) does not correspond to the type of the output of the function get_alerts_udf.

How do I define otherwise(["0","0"]) to correspond to:

      StructType(
                    [
                        StructField('probability', StringType()),
                        StructField('level', StringType())
                    ]
       )

UPDATE:

I still get the error:

pyspark.sql.utils.AnalysisException: u"cannot resolve 'CASE WHEN (is_inside` = 1) THEN <lambda>(c1, c2, c3) ELSE named_struct('col1', '0', 'col2', '0') END' due to data type mismatch: THEN and ELSE expressions should all be same type or coercible to a common type;;`.

According to the recommendation from a duplicate post, I used otherwise(func.struct(func.lit("xxx"),func.lit("yyy"))).

Markus
  • 3,562
  • 12
  • 48
  • 85
  • I still get the error `pyspark.sql.utils.AnalysisException: u"cannot resolve 'CASE WHEN (`is_inside` = 1) THEN (c1, c2, c3) ELSE named_struct('col1', '0', 'col2', '0') END' due to data type mismatch: THEN and ELSE expressions should all be same type or coercible to a common type;;\`. I used `otherwise(func.struct(func.lit("xxx"),func.lit("yyy")))` – Markus Sep 27 '18 at 12:04
  • @user6910411: Can you please remove the duplicate tag? – Markus Sep 27 '18 at 13:02
  • Not `otherwise(func.struct(func.lit("xxx"),func.lit("yyy"))` but `otherwise(func.struct(func.lit("xxx").alias("probability"), func.lit("yyy").alias("level")))` – zero323 Sep 28 '18 at 09:23

0 Answers0