In my Glue job, I have access to a "Connection" which grants access to some sort of database (in my case, Redshift). I can use glue_context.write_dynamic_frame.from_jdbc_conf
to dump a DynamicFrame into a table in the connection, but what if I need to run some custom SQL on the connection? I can't seem to find anything in the API that allows just getting a connection or cursor object to run queries against. I am worried that I am going to have to end up using something like the pg8000
library to connect to the database separately to run queries.
Asked
Active
Viewed 935 times
4

Christopher Armstrong
- 2,107
- 2
- 14
- 22
1 Answers
0
I'm following this question for a very long time, and my preference is by scala to query Athena and add partitions. However seems like there isn't any way implemented or sources developed it yet.
But using Lambda you can use below script to execute MSCK REPAIR TABLE
import boto3
def lambda_handler(event, context):
bucket_name = 'some_bucket'
client = boto3.client('athena')
config = {
'OutputLocation': 's3://' + bucket_name + '/',
'EncryptionConfiguration': {'EncryptionOption': 'SSE_S3'}
}
# Query Execution Parameters
sql = 'MSCK REPAIR TABLE some_database.some_table'
context = {'Database': 'some_database'}
client.start_query_execution(QueryString = sql,
QueryExecutionContext = context,
ResultConfiguration = config)

RHarsha
- 1
- 1
- 8
-
Balant copy/paste from https://stackoverflow.com/a/47554812/3001626 – David Arenburg Mar 09 '20 at 08:53