I am doing Hive Streaming on a DSE 3.0 cluster (Hive 0.9) using a Python mapper. My python script imports the statsmodels module, which requires Python 2.7. Since the default is not 2.7 (it's 2.4), I download and install it, as well as the statsmodels module.
However, when running the simple Hive query
hive> select transform (line) using 'python python-mapper.py' from docs;
where "docs" is a Hive table with line STRING's. However, I get the error:
File "python-mapper.py", line 6, in ?
import statsmodels
ImportError: No module named statsmodels
So I changed my Hive query to:
hive> select transform (line) using 'python2.7 python-mapper.py' from docs;
to invoke version 2.7. But then I get the error
Caused by: java.io.IOException: Cannot run program "python2.7":
java.io.IOException: error=2, No such file or directory
I have also tried python27 and /usr/local/bin/python2.7 and am still getting the same error. Has anyone encountered this before? I have already referenced the second answer to the post On linux SUSE or RedHat, how do I load Python 2.7. Any advice would be greatly appreciated!
Thanks, AM