I have got a glue job that is only executing a stored procedure in RDS SQL Server. Everything works until the execution take more than about 30 sec. Once it does that, it appears that glue/python are closing the connection, causing SQL to terminate the statement and perform a rollback. If I run the stored procedure manually from SSMS, it never fails or times out, only from Glue.
I have been able to isolate the issue to Glue/Python. By adding a sleep before closing the connection, it works (as long as the stored procedure completes before the value in the sleep function). Obviously, this is a terrible long-term solution. How can I force Glue/Python to not close the connection until the execute method completes?
The job script is:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
job.commit()
logger = glueContext.get_logger()
# dw-poc-dev spark test
source_jdbc_conf = glueContext.extract_jdbc_conf('myConnectionInGlue')
from py4j.java_gateway import java_import
java_import(sc._gateway.jvm,"java.sql.Connection")
java_import(sc._gateway.jvm,"java.sql.DatabaseMetaData")
java_import(sc._gateway.jvm,"java.sql.DriverManager")
java_import(sc._gateway.jvm,"java.sql.SQLException")
conn = sc._gateway.jvm.DriverManager.getConnection(source_jdbc_conf.get('url'), source_jdbc_conf.get('user'), source_jdbc_conf.get('password'))
logger.info("execting SQL Stored procedure")
print(conn)
print(conn.getMetaData().getDatabaseProductName())
statement = f"""
EXEC sp_executesql N'myDBName.dbo.mySproc'
"""
# Create callable statement and execute it
exec_statement = conn.prepareCall(statement)
exec_statement.execute()
logger.info("completed SQL Stored procedure")
conn.close()
If I execute the stored procedure directly in SSMS, it completes successfully, and within 40 seconds.
When I execute via glue, I see an error (3621 - The statement has been terminated) in profiler, and then a rollback as the catch block in the sql picks up the error.
I have tried putting the execute into an async function to await the exec_statement.execute, but a java object cant be used in an await expression.
EDIT: I have added a sleep statement between the exec_statement.execute() and the close conn.close(), and the process seems to be working, and the amount of time I put in the parameter for sleep seems to not matter. Its working, though a bit brute force, but I'm certain that there is a better solution.