Can I execute arbitrary SQL on a Glue Connection in a Python-based Glue job?

Question

In my Glue job, I have access to a "Connection" which grants access to some sort of database (in my case, Redshift). I can use glue_context.write_dynamic_frame.from_jdbc_conf to dump a DynamicFrame into a table in the connection, but what if I need to run some custom SQL on the connection? I can't seem to find anything in the API that allows just getting a connection or cursor object to run queries against. I am worried that I am going to have to end up using something like the pg8000 library to connect to the database separately to run queries.

score 0 · Answer 1 · answered Nov 05 '18 at 00:47

I'm following this question for a very long time, and my preference is by scala to query Athena and add partitions. However seems like there isn't any way implemented or sources developed it yet.

But using Lambda you can use below script to execute MSCK REPAIR TABLE

import boto3

def lambda_handler(event, context):
    bucket_name = 'some_bucket'

client = boto3.client('athena')

config = {
    'OutputLocation': 's3://' + bucket_name + '/',
    'EncryptionConfiguration': {'EncryptionOption': 'SSE_S3'}

}

# Query Execution Parameters
sql = 'MSCK REPAIR TABLE some_database.some_table'
context = {'Database': 'some_database'}

client.start_query_execution(QueryString = sql, 
                             QueryExecutionContext = context,
                             ResultConfiguration = config)

Balant copy/paste from https://stackoverflow.com/a/47554812/3001626 — David Arenburg, Mar 09 '20 at 08:53

Can I execute arbitrary SQL on a Glue Connection in a Python-based Glue job?

1 Answers1