How to pass the parameter to User-Defined Function?

Question

I have a user-defined function:

calc = udf(calculate, FloatType())

param1 = "A"

result = df.withColumn('col1', calc(col('type'), col('pos'))).groupBy('pk').sum('events')

def calculate(type, pos):
   if param1=="A":
       a, b = [ 0.05, -0.06 ]
   else:
       a, b = [ 0.15, -0.16 ]
   return a * math.pow(type, b) * max(pos, 1)

I need to pass a parameter param1 to this udf. How can I do it?

@MaulikDoshi: This is `def calculate(type, pos):`. Do you mean that you'd need to see the complete function? — Dinosaurius, Nov 13 '17 at 09:38
Possible duplicate of [Passing a data frame column and external list to udf under withColumn](https://stackoverflow.com/questions/37409857/passing-a-data-frame-column-and-external-list-to-udf-under-withcolumn) — akoeltringer, Nov 13 '17 at 09:52

Paul V · Accepted Answer · 2017-11-13T10:34:10.903

14

You can use lit or typedLit as a parameter for your udf like this:

In Python:

from pyspark.sql.functions import udf, col, lit
mult = udf(lambda value, multiplier: value * multiplier)
df = spark.sparkContext.parallelize([(1,),(2,),(3,)]).toDF()
df.select(mult(col("_1"), lit(3)))

In Scala:

import org.apache.spark.sql.functions.{udf, col, lit}
val mult = udf((value: Double, multiplier: Double) => value * multiplier)
val df = sparkContext.parallelize((1 to 10)).toDF
df.select(mult(col("value"), lit(3)))

edited Nov 13 '17 at 10:34

answered Nov 13 '17 at 09:53

Paul V

196
1
8

So, in my case I can use `lit(param1)`? – Dinosaurius Nov 13 '17 at 09:55
Yes! `lit` will just act as another column with a unique value. – Paul V Nov 13 '17 at 10:02
Sorry I gave you an answer in Scala not in Python, but the idea is the same! – Paul V Nov 13 '17 at 10:04
So, lit creates an additional column in `df`? – Dinosaurius Nov 13 '17 at 10:04
Not sure if I understand your concern but it is not going to alter your original df. It will use the value of `lit` and act like it is a separate column. – Paul V Nov 13 '17 at 10:07
Let me test it. I use PySpark, not Scala. – Dinosaurius Nov 13 '17 at 10:16

How to pass the parameter to User-Defined Function?

1 Answers1

Linked