0

I am using spark-sql-2.4.1v how to do various joins depend on the value of column I need get multiple look up values of map_val column for given value columns as show below.

Sample data:

val data = List(
  ("20", "score", "school", "2018-03-31", 14 , 12),
  ("21", "score", "school", "2018-03-31", 13 , 13),
  ("22", "rate", "school", "2018-03-31", 11 , 14),
  ("21", "rate", "school", "2018-03-31", 13 , 12)
 )
val df = data.toDF("id", "code", "entity", "date", "value1", "value2")

df.show

+---+-----+------+----------+------+------+
| id| code|entity|      date|value1|value2|
+---+-----+------+----------+------+------+
| 20|score|school|2018-03-31|    14|    12|
| 21|score|school|2018-03-31|    13|    13|
| 22| rate|school|2018-03-31|    11|    14|
| 21| rate|school|2018-03-31|    13|    12|
+---+-----+------+----------+------+------+




 val resultDs = df
                 .withColumn("value1",
                        when(col("code").isin("rate") , functions.callUDF("udfFunc",col("value1")))
                         .otherwise(col("value1").cast(DoubleType))
                      )

udfFunc maps as follows

11->a
12->b
13->c
14->d

Expected output

+---+-----+------+----------+------+------+
| id| code|entity|      date|value1|value2|
+---+-----+------+----------+------+------+
| 20|score|school|2018-03-31|    14|    12|
| 21|score|school|2018-03-31|    13|    13|
| 22| rate|school|2018-03-31|    a |    14|
| 21| rate|school|2018-03-31|    c |    12|
+---+-----+------+----------+------+------+

But it is giving output as

+---+-----+------+----------+------+------+
| id| code|entity|      date|value1|value2|
+---+-----+------+----------+------+------+
| 20|score|school|2018-03-31|  null|    12|
| 21|score|school|2018-03-31|  null|    13|
| 22| rate|school|2018-03-31|    a |    14|
| 21| rate|school|2018-03-31|    c |    12|
+---+-----+------+----------+------+------+

why "otherwise" condition is not working as expected. any idea what is wrong here ??

BdEngineer
  • 2,929
  • 4
  • 49
  • 85

2 Answers2

2

Column should contains same datatype.

Note - DoubleType can not store StringTyp data, So you need to convert DoubleType to StringType.

val resultDs = df
.withColumn("value1",
        when(col("code") === lit("rate") ,functions.callUDF("udfFunc",col("value1")))
        .otherwise(col("value1").cast(StringType)) // Should be StringType
    )

Or

val resultDs = df
                 .withColumn("value1",
                        when(col("code").isin("rate") , functions.callUDF("udfFunc",col("value1")))
                         .otherwise(col("value1").cast(StringType)) // Modified to StringType
                      )
Srinivas
  • 8,957
  • 2
  • 12
  • 26
  • 1
    thanks a lot Srini, you been always helpful...did a overlook thanks for correcting....thanks a billion lot – BdEngineer Aug 03 '20 at 06:51
  • can please help me on this https://stackoverflow.com/questions/63395767/applying-withcolumn-only-when-column-exists-in-the-dataframe – BdEngineer Aug 13 '20 at 13:28
  • can you advice me on the same https://stackoverflow.com/questions/63450135/applying-when-condition-only-when-column-exists-in-the-dataframe – BdEngineer Aug 17 '20 at 11:50
  • any advice how to handle this scenario in spark ? https://stackoverflow.com/questions/63668096/adding-a-new-column-using-withcolumn-from-a-lookup-table-dynamically – BdEngineer Aug 31 '20 at 09:21
  • could you please advice what is wrong here in reduce function https://stackoverflow.com/questions/63843599/reduce-result-datasets-into-single-dataset – BdEngineer Sep 11 '20 at 08:48
  • can you tell me what is wrong with this broadcast variable accessing ? https://stackoverflow.com/questions/64003697/spark-broadcast-variable-map-giving-null-value – BdEngineer Sep 22 '20 at 05:47
1

I would suggest to modify to-

df
                 .withColumn("value1",
                        when(col("code") === lit("rate") , functions.callUDF("udfFunc",col("value1")))
                         .otherwise(col("value1").cast(StringType))
                      )

and check once

Som
  • 6,193
  • 1
  • 11
  • 22
  • sorry no luck .. still getting other than "rate" , value1 column coming as null. – BdEngineer Aug 03 '20 at 06:42
  • 1
    As Srinivas explained, this is because of casting, change it to StringType – Som Aug 03 '20 at 06:54
  • any advice how to handle this scenario in spark ? https://stackoverflow.com/questions/63668096/adding-a-new-column-using-withcolumn-from-a-lookup-table-dynamically – BdEngineer Aug 31 '20 at 09:21