3

I am looking into converting some UDFs/UDAFs to Spark-Native functions to leverage Catalyst and codegen.

Looking through some examples (for example: https://github.com/apache/spark/pull/7214/files for Levenshtein) it seems like we need to add these functions to the Spark framework itself (i.e. via FunctionRegistry.scala).

Is there a way to add custom Spark-Native functions in "userspace" i.e. without forking/modifying the actual Spark codebase?

Thank you!

cozos
  • 787
  • 10
  • 19
  • Do you want it to be available in sql queries? – Gelerion Aug 22 '19 at 10:44
  • Yes that would be great. – cozos Aug 27 '19 at 00:29
  • 2
    To all reading, this, I finally found some resources on how to add Spark-native functions: [here](https://github.com/swoop-inc/spark-alchemy/wiki/Spark-Native-Functions) and [here](https://blog.simeonov.com/2018/11/14/apache-spark-native-functions/). If I manage to get it working I will post the answer. – cozos Aug 27 '19 at 00:29
  • @cozos did you ever get this to work? – user3613290 Mar 16 '21 at 18:38
  • @user3613290 Yes I got it to work. I followed this open source project which implements Spark Native functions and registers them: https://github.com/swoop-inc/spark-alchemy/blob/master/alchemy/src/main/scala/com/swoop/alchemy/spark/expressions/hll/HLLFunctionRegistration.scala – cozos Mar 17 '21 at 17:52
  • @cozos ah awesome. I didn't want to import third party repos, so I actually just registered some functions directly in the FunctionRegistry. Unfortunately I'm not on Spark 3.0+ where it looks like they've streamlined the process https://spark.apache.org/docs/3.0.0-preview/api/java/org/apache/spark/sql/SparkSessionExtensions.html#injectFunction-scala.Tuple3- – user3613290 Mar 17 '21 at 20:48
  • @user3613290 Yeah I actually did this before Spark 3.0, by basically overloading FunctionRegistry. The open source project I linked has a tag/release in Spark 2.X which is a good example. – cozos Mar 18 '21 at 01:22

0 Answers0