0

I have requirement where I need to use pandas dataframe attribute of Explode and json_normalize. It seems by default python glue shell runs 0.24.2 pandas version.

I was able to use .whl version pandas-0.23.0-cp36-cp36m-manylinux1_x86_64.whl.When I tried providing .whl files for pandas version by pandas-0.25.3-cp35-cp35m-manylinux1_x86_64.whl,pandas-1.0.0-cp38-cp38-manylinux1_x86_64.whl,pandas-1.0.3-cp38-cp38-manylinux1_x86_64.whl which all are failing to load with Below error message:

  Traceback (most recent call last):
  File "/glue/lib/installation/pandas/__init__.py", line 32, in <module>
    from pandas._libs import hashtable as _hashtable, lib as _lib, tslib as _tslib
  File "/glue/lib/installation/pandas/_libs/__init__.py", line 3, in <module>
    from .tslibs import (
  File "/glue/lib/installation/pandas/_libs/tslibs/__init__.py", line 3, in <module>
    from .conversion import localize_pydatetime, normalize_date
ModuleNotFoundError: No module named 'pandas._libs.tslibs.conversion'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/runscript.py", line 211, in <module>
    runpy.run_path(temp_file_path, run_name='__main__')
  File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmp/glue-python-scripts-maew3inn/EPP_Json_To_CSV.py", line 1, in <module>
  File "/glue/lib/installation/pandas/__init__.py", line 37, in <module>
    f"C extension: {module} not built. If you want to import "
ImportError: C extension: No module named 'pandas._libs.tslibs.conversion' not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/runscript.py", line 230, in <module>
    raise e_type(e_value).with_traceback(new_stack)
  File "/tmp/glue-python-scripts-maew3inn/EPP_Json_To_CSV.py", line 1, in <module>
  File "/glue/lib/installation/pandas/__init__.py", line 37, in <module>
    f"C extension: {module} not built. If you want to import "
ImportError: C extension: No module named 'pandas._libs.tslibs.conversion' not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.

1 Answers1

0

Currently, you cannot import pandas library to Glue. AWS GLUE DOCUMENTATION

Only pure Python libraries can be used. Libraries that rely on C extensions, such as the pandas Python Data Analysis Library, are not yet supported.

It's a duplicate question: Use AWS Glue Python with NumPy and Pandas Python Packages