0

I am working in the configuration of cluster and the execution of a task in python.

My AMI version used is emr-4.7.0. With only spark selected as application to install.

Previous the task execution the bootstrap action was runned:

sudo pip install pymongo
sudo pip install py2neo
sudo pip install pymysql
sudo pip install pandas
someting to download the code from s3

After that, I connect to the master node using ssh, and executed my spark application (a python script). Then I got the following error:

ImportError: C extension: hashtable not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace' to build the C extensions first.

I have tried several things but no luck. It would be great if you can provide with any hint.

Additional info (edited): The python version used is 2.7, and it is the only version installed in the node. The pip is upgraded to the newest version:

sudo pip install --upgrade pip
Requirement already up-to-date: pip in /usr/local/lib/python2.7/site-packages

It is interesting, when I try to import pandas without any spark also have the problem. Using ipython I executed:

import pandas as pd

and the first time I got:

/usr/lib64/python2.7/locale.pyc in _parse_localename(localename)
    473     elif code == 'C':
    474         return None, None
--> 475     raise ValueError, 'unknown locale: %s' % localename
    476
    477 def _build_localename(localetuple):

ValueError: unknown locale: UTF-8

but when the same statement was executed I got:

ImportError                               Traceback (most recent call last)
<ipython-input-2-af55e7023913> in <module>()
----> 1 import pandas as pd

/usr/local/lib64/python2.7/site-packages/pandas/__init__.py in <module>()
     29                       "pandas from the source directory, you may need to run "
     30                       "'python setup.py build_ext --inplace' to build the C "
---> 31                       "extensions first.".format(module))
     32
     33 from datetime import datetime

ImportError: C extension: hashtable not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace' to build the C extensions first.

I finally find out the solution. It was simpler than I expected. After the last update of information I realized the source of the error is the locale, and can be easily fixed with:

export LANG=es_ES.UTF-8
export LC_CTYPE=es_ES.UTF-8
export LC_ALL=es_ES.UTF-8
  • Are your nodes using the same python installs? Try loggin what the python home is etc. I suspect your python install is different and it's trying to run pandas that was built with a different python version. – Andy Hayden Jun 02 '16 at 23:14
  • did you upgrade pip ? did you check your cluster log to check if bootstrapping went well ? – eliasah Jun 03 '16 at 06:39
  • @AndyHayden there is only one version of python installed in the server: python 2.7 – Enrique Altuna Jun 03 '16 at 14:33
  • @eliasah the bootstrapping went well, and yes the pip was upgraded: sudo pip install --upgrade pip Requirement already up-to-date: pip in /usr/local/lib/python2.7/site-packages – Enrique Altuna Jun 03 '16 at 14:44
  • I finally find out the solution, thank you for your comments. – Enrique Altuna Jun 03 '16 at 15:26

0 Answers0