re Spark Doc 2.3:
registerJavaFunction(name, javaClassName, returnType=None)[source]
Register a Java user-defined function as a SQL function.
In addition to a name and the function itself, the return type can be >optionally specified. When the return type is not specified we would infer it via reflection.
Parameters:
name – name of the user-defined function
javaClassName – fully qualified name of java class
returnType – the return type of the registered Java function. The value can be either a pyspark.sql.types.DataType object or a DDL-formatted type string.
My question:
I want to have a library of large number of UDFs, for Spark 2.3+, all written in Java and all accessible from PySpark/Python.
Reading documentation which I linked above it appears that the there is a one to one mapping between a class and Java UDF function (callable from Spark-SQL in PySpark). So that if I have say 10 Java UDF functions then I need to create 10 public Java classes with 1 UDF per class to make them callable from PySpark/SQL.
Is this correct?
Can I create 1 public Java class and place a number of different UDFs inside the 1 class and make all UDFs callable from PySpark in Spark 2.3 ?
This post does not provide any Java sample code to help with my question. It looks like it is all in Scala. I want it all in Java please. Do I need to extend a class or implement interface to do it in Java? Any links to sample Java code to be called from PySpark-SQL would be appreciated.
Spark: How to map Python with Scala or Java User Defined Functions?