3

I'd like to have a aws glue python-shell job connect to a MS SQL Server. I understand that I should use the pymssql library. On my computer I have the script working but with AWS I understand that I need to upload the pymssql library to S3 and reference it.

I'm following their example on how to provide your own egg file if I wanted to connect to redshift but after creating the egg file and running the script I get this error

Couldn't find index page for 'redshift-module' (maybe misspelled?)

Can anyone help provide how I can accomplish providing my own library? In either redshift or ms sql. Just looking for an example I can adapt and work from.

Full Job Log

Creating /glue/lib/installation/site.py
Processing redshift_module-0.1-py3.7.egg
Copying redshift_module-0.1-py3.7.egg to /glue/lib/installation
Adding redshift-module 0.1 to easy-install.pth file

Installed /glue/lib/installation/redshift_module-0.1-py3.7.egg
Processing dependencies for redshift-module==0.1
Searching for redshift-module==0.1
Reading https://pypi.org/simple/redshift-module/
Scanning index of all packages (this may take a while)
Reading https://pypi.org/simple/

Full Error Output

Couldn't find index page for 'redshift-module' (maybe misspelled?)
No local packages or working download links found for redshift-module==0.1
error: Could not find suitable distribution for Requirement.parse('redshift-module==0.1')
Michael Black
  • 661
  • 11
  • 24
  • Can you try this https://stackoverflow.com/questions/46329561/aws-glue-python/54852126#54852126 and let me know if it works for you? – Prabhakar Reddy Aug 16 '19 at 02:22
  • That's what I'm doing. Even though that post is about Glue (Apache Spark), I'm working with a Python-Shell, they both require you to have your third-party library in S3 in the job configuration. In the log I see that it finds my .egg file but it's not finding my library. – Michael Black Aug 16 '19 at 22:19

2 Answers2

1

The answer is mentioned here

In a nut shell, AWS Glue uses Python 3.6 while the egg 'redshift_module-0.1-py3.7.egg' has been built using python 3.7

You might also need to need to have a look on the documentation which has some useful packaging options like install_requires=['package']

1

I faced the same issue while performing basic testing in glue job, on further investigating the scenario I noticed that Glue Python shell 3 uses Python 3.6 only. NOTE: Created egg files with different versions of python will not support each other what I observe in this issue.

To omit this, you would need to make a wheel file which is compatible with any version.

  1. Run below command in your directory where setup.py file exist:

    $ python3 setup.py bdist_wheel

  2. Upload wheel file to S3 bucket

  3. Go to AWS glue job console and create new Job, give all required parameters and change the type as "Python Shell" and give your s3 path (where wheel file exist) in "Python library path"

vijay rajput
  • 180
  • 1
  • 5