5

I am running Spark Thrift Server on EMR. I start up the Spark Thrift Server by:

  sudo -u spark /usr/lib/spark/sbin/start-thriftserver.sh --queue interactive.thrift --jars /opt/lib/custom-udfs.jar

Notice that I have a customer UDF jar and I want to add it to the Thrift Server classpath, so I added --jars /opt/lib/custom-udfs.jar in the above command.

Once I am in my EMR, I issued the following to connect to the Spark Thrift Server.

beeline -u jdbc:hive2://localhost:10000/default

Then I was able to issue command like show databases. But how do I access the custom UDF? I thought by adding the --jars option in the Thrift Server startup script, that would add the jar for Hive resource to use as well.

The only way I can access the custom UDF now is by adding the customer UDF jar to Hive resource

add jar /opt/lib/custom-udfs.jar

Then create function of the UDF.

Question: Is there a way to auto config the custom UDF jar without adding jar each time to the spark session?

Thanks!

seamonkeys
  • 123
  • 4
  • Any update on this? For clarity, the `custom-udfs.jar`, does it contain Spark SQL udfs or Hive UDFs (implemented as extensions of the hive UDF class)? – kentt Jan 08 '18 at 21:49
  • @Kentt Do you know the answer for either case? Spark SQL UDF or Hive UDF? – Azeroth2b Apr 15 '22 at 14:13

2 Answers2

0

The easiest way is to edit the file start-thriftserver.sh, at the end:

  1. Wait server is ready
  2. Execute setup SQL query

You could also post a proposal on JIRA, this is a very good feature "Execute setup code at start up".

Thomas Decaux
  • 21,738
  • 2
  • 113
  • 124
0

The problem here seems to be that the --jars should be positioned correctly; which should be the first argument. I too had trouble getting the jars to work properly. This worked for me

# if your spark installation is in /usr/lib/
sudo -u spark /usr/lib/spark/sbin/start-thriftserver.sh  \
  --jars /path/to/jars/jar1.jar,/path/to/jars/jar2.jar \
  --properties-file ./spark-thrift-sparkconf.conf \ # this is only needed if you want to customize spark configuration, it looks similar to spark-defaults.conf
  --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2