TL;DR: how can I update the jar of a custom UDF in hive?
I wrote my own (generic) udf, working very well. I can define a new function and use it with the command:
Now I want to update my udf, I thus want to put an updated version of the jar, with the same name in hdfs. Afterwards, what happens is:
- first call to the function gives
java.io.IOException: Previous writer likely failed to write hdfs://ip-10-0-10-xxx.eu-west-1.compute.internal:8020/tmp/hive/hive/_tez_session_dir/0de6055d-190d-41ee-9acb-c6b402969940/myfunc.jar Failing because I am unlikely to write too.
- second call gives
org.apache.hadoop.hive.ql.metadata.HiveException: Default queue should always be returned.Hence we should not be here.
The log file shows:
Localizing resource because it does not exist: file:/tmp/8f45f1b7-2850-4fdc-b07e-0b53b3ddf5de_resources/myfunc.jar to dest: hdfs://ip-10-0-10-129.eu-west-1.
compute.internal:8020/tmp/hive/hive/_tez_session_dir/994ad52c-4b38-4ee2-92e9-67076afbbf10/myfunc.jar
tez.DagUtils (DagUtils.java:localizeResource(961)) - Looks like another thread is writing the same file will wait.
tez.DagUtils (DagUtils.java:localizeResource(968)) - Number of wait attempts: 5. Wait interval: 5000
tez.DagUtils (DagUtils.java:localizeResource(984)) - Could not find the jar that was being uploaded
What I already tried:
- add the jar to
hive.reloadable.aux.jars.path
andhive.aux.jar.path
- different combinations of list jar/delete jar/create function/reload to no avail.
I even end up have a query starting OK apparently but then just hangs there, not moving forward, nothing in the logs, no DAG created.
INFO : converting to local hdfs:///hive-udf-wp/hive-udf-wp.jar
INFO : Added [/tmp/19e0c9fc-9c7c-4de5-a034-ced062f87f64_resources/hive-udf-wp.jar] to class path
INFO : Added resources: [hdfs:///hive-udf-wp/hive-udf-wp.jar]
I would think that asking tez to not reuse the current session could do the trick, as then new sessions would be created without an old version of the jar. Would that be an option?