2

I am a Spark beginner. I am using Python and Spark dataframes. I just learned about user defined functions (udf) that one has to register first in order to use it. Question: in what situation do you want to create a udf vs. just a simple (Python) function?

Thank you so much!

user3245256
  • 1,842
  • 4
  • 24
  • 51

1 Answers1

1

Your code will be neater if you use UDFs, because it will take a function, and the correct return type (defaults to string if empty), and create a column expression, which means you can write nice things like:

my_function_udf = udf(my_function, DoubleType())
myDf.withColumn("function_output_column", my_function_udf("some_input_column"))

This is just one example of how you can use a UDF to treat a function as a column. They also make it easy to introduce stuff like lists or maps into your function logic via a closure, which is explained very well here

Zooby
  • 325
  • 1
  • 7