I'm using a Python virtual environment to load modules that aren't available on our cluster for use in a Hive UDF. I'm unable to source the venv, so when the Python UDF is called in the shell script, the script errors since the modules cannot be found.
When calling ls from the shell script, the venv appears in the list.
DELETE FILE /temp/venv;
ADD FILE /temp/venv;
DELETE FILE udf.sh;
ADD FILE udf.sh;
SOURCE venv/bin/activate;
SELECT TRANSFORM(1)
USING 'bash udf.sh'
AS (test_result)
Results in File: venv/bin/activate is not a file.
SOURCE ../venv/bin/activate;
Results in FAILED: ParseException line 1:2 cannot recognize input near 'This' 'file' 'must'
Within the shell script, If I try to use:
. venv/bin/activate
It returns an exit code 1.
Any thoughts?
Thanks, Dave
Solved using this: https://stackoverflow.com/a/23069201/10542262
Within the shell script, instead of doing:
python [path]/[script].py
You can call Python from the venv and no longer need to activate the venv.
[path/to/venv/]/bin/python [path]/[script].py