3

From the Best Practices for Working with AWS Lambda Functions:

Take advantage of execution context reuse to improve the performance of your function. Initialize SDK clients and database connections outside of the function handler, [...]

I would like to implement this principle to improve my lambda function, where a database handle is initialized and closed every time the function is invocated. Take the following example:

def lambda_handler(event, context):
    # Open a connection to the database
    db_handle = connect_database()    

    # Do something with the database
    result = perform_actions(db_handle)  

    # Clean up, close the connection
    db_handle.close()       

    # Return the result
    return result    

From my understanding of the AWS documentation, the code should be optimized as follows:

# Initialize the database connection outside the handler
db_handle = conn_database()

def lambda_handler(event, context):
    # Do something with the database and return the result
    return perform_actions(db_handle)

This would result in the db_handle.close() method not being called, thus potentially leaking a connection.

How should I handle the cleanup of such resources when using AWS Lambda with Python?

Kurt Stolle
  • 322
  • 4
  • 18

2 Answers2

1

Many people looking for the same thing with you. I believe it is impossible at this time. But we could handle the issue from the database side.

Take a look at this one

Tuan Vo
  • 1,875
  • 10
  • 10
0

The connection leak would only happen while the Lambda execution environment is alive; in other words the connection would timeout (be closed) after the execution environment is destroyed.

Whether a global connection object is worth implementing depends on your particular use case:
- how much of the total execution time is taken by the database initialization
- how often your function is called
- how do you handle database connection errors

If you want to have a bit more control of the connection you can try this approach which recycles the database connection every two hours or when encountering a database-related exception:

# Initialize the global object to hold database connection and timestamp
db_conn = {
    "db_handle": None,
    "init_dt": None
}

def lambda_handler(event, context):
    # check database connection
    if not db_conn["db_handle"]:
        db_conn["db_handle"] = connect_database()
        db_conn["init_dt"] = datetime.datetime.now() 
    # Do something with the database and return the result
    try:
        result = do_work(db_conn["db_handle"])
    except DBError:
         try:
             db_conn["db_handle"].close()
         except:
             pass
         db_conn["db_handle"] = None
         return "db error occured"      
    # check connection age
    if datetime.datetime.now() - db_conn["init_dt"] > datetime.timedelta(hours=2):
         db_conn["db_handle"].close()
         db_conn["db_handle"] = None
    return result

Please note I haven't tested the above on Lambda so you need to check it with your setup.

Ionut Ticus
  • 2,683
  • 2
  • 17
  • 25
  • The connection would indeed timeout after a while (depending on the database settings), but that's not the same as cleaning up resources. My question is specifically about cleaning up resources (of any kind) before the execution context is cancelled. – Kurt Stolle Mar 08 '20 at 15:55
  • I don't think there is a *Lambda execution environment destroyed* event hook at the moment; you can try to reduce the leak duration by adjusting the recycle interval and idle connection timeout setting for the database (`idle_in_transaction_session_timeout` for PostgreSQL, `wait_timeout` for MySQL etc.). You can also put a *connection pooler* in between. – Ionut Ticus Mar 08 '20 at 22:13
  • This solves the issue in the example, but doesn't answer the question – Kurt Stolle Mar 13 '20 at 07:58