1

I am doing Hive Streaming on a DSE 3.0 cluster (Hive 0.9) using a Python mapper. My python script imports the statsmodels module, which requires Python 2.7. Since the default is not 2.7 (it's 2.4), I download and install it, as well as the statsmodels module.

However, when running the simple Hive query

hive> select transform (line) using 'python python-mapper.py' from docs;

where "docs" is a Hive table with line STRING's. However, I get the error:

File "python-mapper.py", line 6, in ?
import statsmodels
ImportError: No module named statsmodels

So I changed my Hive query to:

hive> select transform (line) using 'python2.7 python-mapper.py' from docs;

to invoke version 2.7. But then I get the error

Caused by: java.io.IOException: Cannot run program "python2.7": 
           java.io.IOException: error=2, No such file or directory

I have also tried python27 and /usr/local/bin/python2.7 and am still getting the same error. Has anyone encountered this before? I have already referenced the second answer to the post On linux SUSE or RedHat, how do I load Python 2.7. Any advice would be greatly appreciated!

Thanks, AM

Community
  • 1
  • 1
user1822685
  • 101
  • 1
  • 1
  • 7
  • Can you confirm that the correct python and libraries are installed on all of your worker nodes? – guyrt Aug 20 '13 at 03:13

1 Answers1

2

I know this is abit old however I came across the same problem recently and thought I would answer for anybody else who came across this problem.

python2.7 command won't work if you have more than one version of python installed.

There are two ways of solving this. One, use a python virtual environment, which would allow you to start your script and add this as a resource to distribute across all your nodes. Two, you can find out where you python2.7 libs are installed by typing:

which python2.7

and then reference the location in your hive query like so (example):

select transform (line) using '/usr/local/bin/python2.7 python-mapper.py' from docs;

Caution each node may have a different location where python2.7 is installed so check before hand. Better yet use a virtual environment.

N00b3eva
  • 85
  • 1
  • 8