0

I'm using a Python virtual environment to load modules that aren't available on our cluster for use in a Hive UDF. I'm unable to source the venv, so when the Python UDF is called in the shell script, the script errors since the modules cannot be found.

When calling ls from the shell script, the venv appears in the list.

DELETE FILE /temp/venv;
ADD FILE /temp/venv;
DELETE FILE udf.sh;
ADD FILE udf.sh;

SOURCE venv/bin/activate;

SELECT TRANSFORM(1)
  USING 'bash udf.sh'
AS (test_result)

Results in File: venv/bin/activate is not a file.

SOURCE ../venv/bin/activate;

Results in FAILED: ParseException line 1:2 cannot recognize input near 'This' 'file' 'must'

Within the shell script, If I try to use:

. venv/bin/activate

It returns an exit code 1.

Any thoughts?

Thanks, Dave


Solved using this: https://stackoverflow.com/a/23069201/10542262

Within the shell script, instead of doing:

python [path]/[script].py

You can call Python from the venv and no longer need to activate the venv.

[path/to/venv/]/bin/python [path]/[script].py
Dave Snow
  • 1
  • 1
  • sorry but question is bit confusing. aren't `DELETE FILE` and `ADD FILE` hive command? if so how are you able to run that from bash? also error says file not found. can you check if `venv/bin/activate` exist. – Gaurang Shah Oct 23 '18 at 14:32
  • I'm running those commands wrapped in hive -e "". I'm trying both the Hive CLI SOURCE command and the bash source (using .) with no success. – Dave Snow Oct 23 '18 at 15:07
  • did you check if file exist. – Gaurang Shah Oct 23 '18 at 15:44

0 Answers0