Spark (python) - explain the difference between user defined functions and simple functions

Question

I am a Spark beginner. I am using Python and Spark dataframes. I just learned about user defined functions (udf) that one has to register first in order to use it. Question: in what situation do you want to create a udf vs. just a simple (Python) function?

Thank you so much!

score 1 · Answer 1 · answered Nov 30 '17 at 20:38

Your code will be neater if you use UDFs, because it will take a function, and the correct return type (defaults to string if empty), and create a column expression, which means you can write nice things like:

my_function_udf = udf(my_function, DoubleType())
myDf.withColumn("function_output_column", my_function_udf("some_input_column"))

This is just one example of how you can use a UDF to treat a function as a column. They also make it easy to introduce stuff like lists or maps into your function logic via a closure, which is explained very well here

Spark (python) - explain the difference between user defined functions and simple functions

1 Answers1