I am trying to use pip to install libraries into a Python virtualenv, which resides on an AWS EMR master node. For some reason, sudo pip works fine, but non-sudo pip silently fails.
Some background:
- I am launching an EMR cluster with version emr-5.19.0.
- I am SSHing into the master node, which uses Amazon Linux AMI 2018.03.
- By default, this OS has both Python 2.7 and 3.4 installed.
- I created a new virtualenv, based on the already-installed Python 3.4.
- I activated my new virtualenv, and verified that all paths point to my venv installation (not to the global Python installation), e.g.
which python
,which pip
all look correct.
So, I create and activate my virtualenv as follows:
cd /home/ec2-user/my_app
virtualenv --python=python3.4 venv
source venv/bin/activate
This works. Next, I try to install a sample library as follows:
pip install numpy
The output is:
Collecting numpy
Installing collected packages: numpy
Successfully installed numpy-1.16.0
However, despite the output claiming success, import numpy
produces an import error, and numpy doesn't show up in pip list
or pip freeze
. I have even drilled into path/to/venv/lib/python3.4/dist-packages
and verified no numpy
directory gets created.
Sadly, this does work:
sudo path/to/venv/bin/pip install numpy
The problem is: I don't want to use sudo, because that would defy best practices. However, it seems like most people are using sudo for this task (examples here and here), so perhaps this is just a requirement in an EMR environment?
Note: This issue only happens for some libraries. For instance, pyspark and geocoder install fine, but numpy and pandas silently fail.