0

I have got a glue job that is only executing a stored procedure in RDS SQL Server. Everything works until the execution take more than about 30 sec. Once it does that, it appears that glue/python are closing the connection, causing SQL to terminate the statement and perform a rollback. If I run the stored procedure manually from SSMS, it never fails or times out, only from Glue.

I have been able to isolate the issue to Glue/Python. By adding a sleep before closing the connection, it works (as long as the stored procedure completes before the value in the sleep function). Obviously, this is a terrible long-term solution. How can I force Glue/Python to not close the connection until the execute method completes?

The job script is:

    import sys
    from awsglue.transforms import *
    from awsglue.utils import getResolvedOptions
    from pyspark.context import SparkContext
    from awsglue.context import GlueContext
    from awsglue.job import Job

    ## @params: [JOB_NAME]
    args = getResolvedOptions(sys.argv, ['JOB_NAME'])

    sc = SparkContext()
    glueContext = GlueContext(sc)
    spark = glueContext.spark_session
    job = Job(glueContext)
    job.init(args['JOB_NAME'], args)
    job.commit()
    logger = glueContext.get_logger()

    # dw-poc-dev spark test
    source_jdbc_conf = glueContext.extract_jdbc_conf('myConnectionInGlue')

    from py4j.java_gateway import java_import
    java_import(sc._gateway.jvm,"java.sql.Connection")
    java_import(sc._gateway.jvm,"java.sql.DatabaseMetaData")
    java_import(sc._gateway.jvm,"java.sql.DriverManager")
    java_import(sc._gateway.jvm,"java.sql.SQLException")

    conn = sc._gateway.jvm.DriverManager.getConnection(source_jdbc_conf.get('url'), source_jdbc_conf.get('user'), source_jdbc_conf.get('password'))
    logger.info("execting SQL Stored procedure")
    print(conn)
    print(conn.getMetaData().getDatabaseProductName())

    statement = f"""
    EXEC sp_executesql N'myDBName.dbo.mySproc'
    """
    # Create callable statement and execute it
    exec_statement = conn.prepareCall(statement)
    exec_statement.execute()

    logger.info("completed SQL Stored procedure")
    conn.close()

If I execute the stored procedure directly in SSMS, it completes successfully, and within 40 seconds.

When I execute via glue, I see an error (3621 - The statement has been terminated) in profiler, and then a rollback as the catch block in the sql picks up the error.

I have tried putting the execute into an async function to await the exec_statement.execute, but a java object cant be used in an await expression.

EDIT: I have added a sleep statement between the exec_statement.execute() and the close conn.close(), and the process seems to be working, and the amount of time I put in the parameter for sleep seems to not matter. Its working, though a bit brute force, but I'm certain that there is a better solution.

0 Answers0