0

I'm trying to use the PyGreSQL package in AWS Glue with a Python job.

I have uploaded to a S3 bucket the wheel file from here:

https://pypi.org/project/PyGreSQL/#files

the 3.6 for x64

then in the job I use:

import pg

with this configuration I get the following error when running the job:


WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.

2020-08-08T20:22:47.845+02:00
Traceback (most recent call last):
  File "/tmp/runscript.py", line 123, in <module>
    runpy.run_path(temp_file_path, run_name='__main__')
  File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmp/glue-python-scripts-vbox2q05/postloading3.py", line 7, in <module>
  File "/glue/lib/installation/pg.py", line 1436, in <module>
    set_query_helpers(_dictiter, _namediter, _namednext, _scalariter)
NameError: name 'set_query_helpers' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/tmp/runscript.py", line 142, in <module>
    raise e_type(e_value).with_traceback(new_stack)
  File "/tmp/glue-python-scripts-vbox2q05/postloading3.py", line 7, in <module>
  File "/glue/lib/installation/pg.py", line 1436, in <module>
    set_query_helpers(_dictiter, _namediter, _namednext, _scalariter)
NameError: name 'set_query_helpers' is not defined

do you know if I'm missing some dependency library to upload? according to AWS the PyGreSQL is compatible with Glue

user2728349
  • 139
  • 1
  • 3
  • 12

1 Answers1

0

It worked by adding the following code:

def get_connection(host):
    rs_conn_string = "host=%s port=%s dbname=%s user=%s password=%s" % ("sffg-redshift-c1....", 5439, "dev", "awsuser", "sfg.")
    rs_conn = pg.connect(dbname=rs_conn_string)
    rs_conn.query("set statement_timeout = 1200000")
    return rs_conn

############################MAIN################################################### 
con1 = get_connection("aredshift-c1....")

and then

import pg

consulting the aws glue pdf guide helped to find this simple way to make it to work

user2728349
  • 139
  • 1
  • 3
  • 12