5

I need to install ahocorasick package on EMR notebook.

But when I call:

sc.install_pypi_package("pyahocorasick")

I am getting an error:

common.h:15:10: fatal error: Python.h: No such file or directory

   #include <Python.h>

            ^~~~~~~~~~

  compilation terminated.

  error: command 'gcc' failed with exit status 1

pandas installing without any problems.

I am getting the similar error when installing as a bootstrap action.

If I call:

%pip install pyahocorasick

it is installing fine, but I can not import it.

I tried this approach: fatal error: Python.h: No such file or directory, python-Levenshtein install

But I can not find any way to run sudo from the notebook.

Edit:

I tried to install gcc on bootstrape stage with the following .sh file called:

sudo yum -y install gcc
sudo yum -y install python3-devel
sudo pip3 install pyahocorasick --user

It doesn't help - I still getting error when call import ahocorasick

Andrey
  • 5,932
  • 3
  • 17
  • 35
  • 1
    Even if you had sudo access, I doubt `yum install` would be available, or work as expected. You'd likely have to modify the AMI for the EMR cluster to bring `python-devel` – OneCricketeer Mar 31 '21 at 14:53
  • @OneCricketeer could you please advice - which AMI should I use as a base AMI ? – Andrey Apr 01 '21 at 14:14
  • I'm saying you might need to make your own, not one publicly available, otherwise the bootstrap script should work too, but is slower – OneCricketeer Apr 01 '21 at 14:24
  • @OneCricketeer Yes, I understand. I just tried to create AMI based on ami-0742b4e673072066f (standard Amazon Linux 2 AMI (HVM), SSD Volume Type). I successfully installed `gcc` and `python-devel`. But it doesn't work with Spark – Andrey Apr 01 '21 at 14:27
  • Same error as before? And I assume you mean it doesn't work with your pypi installations because Spark itself doesn't need gcc or Python development packages – OneCricketeer Apr 01 '21 at 14:40
  • 1
    @OneCricketeer thanks, it works. My mistake was that I installed the package to python 2 instead of python 3 – Andrey Apr 09 '21 at 13:43
  • Hmm. `pip3 install` wouldn't have done that, so I assume you mean pyspark was using Python2? – OneCricketeer Apr 09 '21 at 13:50
  • 1
    @OneCricketeer No - initially I created AMI with `pip install pyahocorasick`. It doesn't work. But it works with `pip3 install pyahocorasick` – Andrey Apr 09 '21 at 13:52

0 Answers0