0

I have one spark scala udf which takes one parameter as column of dataframe and other parameter as List but when I run the function it throws error pointing to list argument as

type mismatch, found spark.sql.row ,required spark.sql.column

I am running udf with argument as follows,

udf_name($"column_name",List_name)

Please Guide

user8167344
  • 353
  • 2
  • 6
  • 17

2 Answers2

0

You need to define multiple instances of your UDF with the lists you want to pass. Since the lists are local scala variables you can do that just before the call (spark will ship the udf to the various executors) e.g.

import org.apache.spark.sql.functions._
val df=List("A","B").toDF
def to_be_udf(s: String, l : List[String])=if (l.isEmpty) "" else "has values"
val udf1=udf((s:String) => to_be_udf (s,List("a")))
val udf2=udf((s:String) => to_be_udf (s,List()))
df.select(udf1($"value"),udf2($"value")).show()

+----------+----------+
|UDF(value)|UDF(value)|
+----------+----------+
|has values|          |
|has values|          |
+----------+----------+
Arnon Rotem-Gal-Oz
  • 25,469
  • 3
  • 45
  • 68
  • HI Arnon I made changes as suggested by you but got error my code is, import org.apache.spark.sql.functions.lit df_col.withColumn("newcol",udf_getMaxMatch($"col",lit(L)) I got error as => java.lang.RuntimeException:unsupported literal type class scala.collection.immutable.$colon$colon List([12],[1234]) at org.apache.spark.sql.catalyst.expressions.Literal$.apply(literals.scala:77)......... – user8167344 Aug 19 '18 at 17:47
  • what's the definition of your udf – Arnon Rotem-Gal-Oz Aug 19 '18 at 18:52
  • val udf_getMaxMatchC = udf{(inpStr:String,codeList:List[String])=>{def getMaxIter(inpStr:String,codeList:List[String],maxMatch:String):String={if(codeList.isEmpty){maxMatch}else{val cmp = codeList.head if(cmp == inpStr.substring(0,cmp.length)){if(cmp.length > maxMatch.length){getMaxIter(inpStr,codeList.tail,cmp)}else{getMaxIter(inpStr,codeList.tail,maxMatch)}}else{getMaxIter(inpStr,codeList.tail,maxMatch)}}}if(codeList.isEmpty){""}else{getMaxIter(inpStr,codeList,"")}}} – user8167344 Aug 20 '18 at 04:48
  • sorry my bad - fixed the answer – Arnon Rotem-Gal-Oz Aug 20 '18 at 06:27
0

You can either pass the constant value to an udf using lit, or alternatively define a method with returns an UDF (my preferred way):

def udf_name(List_name:List[String]) = {
  udf((name:String) => {
    // do something 
    List_name.contains(name)
  })
}

val List_name : List[String] = ???

df
  .withColumn("is_name_in_list", udf_name(List_name)($"column_name"))
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145