62

I'm looking to create a RESTful API using AWS Lambda/API Gateway connected to a MongoDB database. I've read that connections to MongoDB are relatively expensive so it's best practice to retain a connection for reuse once its been established rather than making new connections for every new query.

This is pretty straight forward for normal applications as you can establish a connection during start up and reuse it during the applications lifetime. But, since Lambda is designed to be stateless retaining this connection seems to be less straight forward.

Therefore, I'm wondering what would be the best way to approach this database connection issue? Am I forced to make new connections every time a Lambda function is invoked or is there a way to pool/cache these connections for more efficient queries?

Thanks.

Roman Podlinov
  • 23,806
  • 7
  • 41
  • 60
Beesknees
  • 769
  • 1
  • 6
  • 8
  • why don't you use dynamoDB, same no sql DB – nur farazi Nov 03 '15 at 03:03
  • 2
    That is an option, and DynamoDB is a noSQL DB, but there are differences which make MongoDB a more suitable choice for my project. Mainly being support for geospatial queries. DynamoDB does now have some support for these types of queries through the use of a geohashing library but I think it's still has a long way to go to catch up to MongoDB in this area. – Beesknees Nov 04 '15 at 11:46
  • Same issue here. Do you already found a solution? – manuerumx May 17 '16 at 02:04
  • @manuerumx I'm in the same boat. I'm considering building a lambda function using the referenced s2-based geohashing [library](https://github.com/awslabs/dynamodb-geo). But the fact remains, DynamoDB has a long way to go to catch up. How did you proceed? – brianjd Jun 03 '16 at 00:16
  • 1
    @brianjd We test Lambda and MongoDB, and Lambda with DynamoDB. The problem with Lambda-MongoDB is the connection timeout. Lambda is a stateless function/service, so the only option is open and close a connection with every single call. You can't handle a connection pool. Thats ok if the lambda function has not intensive use. The better approach is use Dynamo, but for our development, Dynamo is not functional. Work with complex documents in DynamoDB is more complicated than we expect, and extends the dev time too much. We choose go with Lambda only for certain functions, like access a S3 files. – manuerumx Jun 06 '16 at 16:01
  • I prefer Mongo API over Dynamo as it is easier to write and there is less boilerplate – Leos Literak Jan 12 '20 at 09:23

11 Answers11

17

AWS Lambda functions should be defined as stateless functions, so they can't hold state like a connection pool.

This issue was also raised in this AWS forum post. On Oct 5, 2015 AWS engineer Sean posted that you should not open and close connection on each request, by creating a pool on code initialization, outside of handler block. But two days later the same engineer posted that you should not do this.

The problem is that you don't have control over Lambda's runtime environment. We do know that these environments (or containers) are reused, as describes the blog post by Tim Wagner. But the lack of control can drive you to drain all your resources, like reaching a connection limit in your database. But it's up to you.

Instead of connecting to MongoDB from your lambda function you can use RESTHeart to access the database through HTTP. The connection pool to MongoDB is maintained by RESTHeart instead. Remember that in regards to performance you'll be opening a new HTTP connection to RESTHeart on each request, and not using a HTTP connection pool, like you could do in a tradicional application.

tuler
  • 3,339
  • 7
  • 34
  • 44
  • 12
    May I throw my one cent... MongoDB Atlas already have the HTTP API, so you can use that. Is very similar to RESTHeart, and I've used it with Lambda recently, very good stuff. – Laerion Mar 24 '17 at 20:51
  • Laerion that's actually a good shout - @tuler yeah your right you SHOULDN'T but in practice I reckon you could implement my solution above. Upvoting for giving the "correct" advice. – Mrk Fldig Jan 17 '18 at 21:20
  • I should add that as of Jan/2018 AWS added the capability to limit the number of concurrent executions of a given lambda function. This will help to keep some control not to overload connection resources. – tuler Mar 13 '18 at 13:53
  • tuler, a reference would help. – Will Bowman May 10 '18 at 15:16
  • Indeed data api is the way to go, you pay for requests and process time, 1,000,000 free requests then 2 dollars per request, 500 hours free then 10 dollars per processing hours. Downside is that we saw its slower – ArielB Jan 17 '23 at 06:42
7

You should assume lambdas to be stateless but the reality is that most of the time the vm is simply frozen and does maintain some state. It would be inefficient for Amazon to spin up a new process for every request so they often re-use the same process and you can take advantage of this to avoid thrashing connections.

To avoid connecting for every request (in cases where the lambda process is re-used):

  1. Write the handler assuming the process is re-used such that you connect to the database and have the lamba re-use the connection pool (the db promise returned from MongoClient.connect).

  2. In order for the lambda not to hang waiting for you to close the db connection, db.close(), after servicing a request tell it not wait for an empty event loop.

Example:

var db = MongoClient.connect(MongoURI);

module.exports.targetingSpec = (event, context, callback) => {
  context.callbackWaitsForEmptyEventLoop = false;
  db.then((db) => {
    // use db
  });
};

From the documentation about context.callbackWaitsForEmptyEventLoop:

callbackWaitsForEmptyEventLoop The default value is true. This property is useful only to modify the default behavior of the callback. By default, the callback will wait until the Node.js runtime event loop is empty before freezing the process and returning the results to the caller. You can set this property to false to request AWS Lambda to freeze the process soon after the callback is called, even if there are events in the event loop. AWS Lambda will freeze the process, any state data and the events in the Node.js event loop (any remaining events in the event loop processed when the Lambda function is called next and if AWS Lambda chooses to use the frozen process). For more information about callback, see Using the Callback Parameter.

Tyler Brock
  • 29,626
  • 15
  • 79
  • 79
  • What should be done in case `async - await` is used instead of callback? `context.callbackWaitsForEmptyEventLoop` this option is only for Lambdas with callback, right? If promises or async-await pattern is used, how to disable this property so as to respond faster without waiting to release the resources? – Avani Khabiya Mar 10 '21 at 10:25
  • This is not a good solution in scale, when lambda grows you will bet different connections and will hit the limit – ArielB Jan 17 '23 at 06:44
6

Restheart is a REST-based server that runs alongside MongoDB. It maps most CRUD operations in Mongo to GET, POST, etc., requests with extensible support when you need to write a custom handler (e.g., specialized geoNear, geoSearch query)

3

I ran some tests executing Java Lambda functions connecting to MongoDB Atlas.

As already stated by other posters Amazon does reuse the Instances, however these may get recycled and the exact behaviour cannot be determined. So one could end up with stale connections. I'm collecting data every 5 minutes and pushing it to the Lambda function every 5 minutes.

The Lambda basically does:

  • Build up or reuse connection
  • Query one record
  • Write or update one record
  • close the connection or leave it open

The actual amount of data is quite low. Depending on time of the day it varies from 1 - 5 kB. I only used 128 MB.

The Lambdas ran in N.Virgina as this is the location where the free tier is tied to.

When opening and closing the connection each time most calls take between 4500 - 9000 ms. When reusing the connection most calls are between 300 - 900 ms. Checking the Atlas console the connection count stays stable. For this case reusing the connection is worth it. Building up a connection and even disconnecting from a replica-set is rather expensive using the Java driver.

For a large scale deployment one should run more comprehensive tests.

Udo Held
  • 12,314
  • 11
  • 67
  • 93
2

Yes, there is a way to cache/retain connection to MongoDB and its name is pool connection. and you can use it with lambda functions as well like this:
for more information you can follow these links:
Using Mongoose With AWS Lambda
Optimizing AWS Lambda(a bit out date)

const mongoose = require('mongoose');

let conn = null;

const uri = 'YOUR CONNECTION STRING HERE';

exports.handler = async function(event, context) {
  // Make sure to add this so you can re-use `conn` between function calls.
  context.callbackWaitsForEmptyEventLoop = false;

  const models = [{name: 'User', schema: new mongoose.Schema({ name: String })}]
  conn = await createConnection(conn, models)
  //e.g.
  const doc = await conn.model('User').findOne({})
  console.log('doc: ', doc);
};

const createConnection = async (conn,models) => {
  // Because `conn` is in the global scope, Lambda may retain it between
  // function calls thanks to `callbackWaitsForEmptyEventLoop`.
  // This means your Lambda function doesn't have to go through the
  // potentially expensive process of connecting to MongoDB every time.

    if (conn == null || (conn && [0, 3].some(conn.readyState))) {
        conn = await mongoose.createConnection(uri, {
        // Buffering means mongoose will queue up operations if it gets
        // disconnected from MongoDB and send them when it reconnects.
        // With serverless, better to fail fast if not connected.
          bufferCommands: false, // Disable mongoose buffering
          bufferMaxEntries: 0, // and MongoDB driver buffering
          useNewUrlParser: true,
          useUnifiedTopology: true,
          useCreateIndex: true
        })
        for (const model of models) {
          const { name, schema } = model
          conn.model(name, schema)
        }
      }
  
      return conn
  }
Milad ranjbar
  • 569
  • 1
  • 8
  • 17
1

Official Best Practice for Connecting from AWS Lambda

You should define the client to the MongoDB server outside the AWS Lambda handler function. Don't define a new MongoClient object each time you invoke your function. Doing so causes the driver to create a new database connection with each function call. This can be expensive and can result in your application exceeding database connection limits.

As an alternative, do the following:

  1. Create the MongoClient object once.
  2. Store the object so your function can reuse the MongoClient across function invocations.

Step 1

Isolate the call to the MongoClient.connect() function into its own module so that the connections can be reused across functions. Let's create a file mongo-client.js for that:

mongo-client.js:

const { MongoClient } = require('mongodb');

// Export a module-scoped MongoClient promise. By doing this in a separate
// module, the client can be shared across functions.
const client = new MongoClient(process.env.MONGODB_URI);

module.exports = client.connect();

Step 2

Import the new module and use it in function handlers to connect to database.

some-file.js:

const clientPromise = require('./mongodb-client');

// Handler
module.exports.handler = async function(event, context) {
  // Get the MongoClient by calling await on the connection promise. Because
  // this is a promise, it will only resolve once.
  const client = await clientPromise;
  
  // Use the connection to return the name of the connected database for example.
  return client.db().databaseName;
}

Resources

For more info, check this Docs.

NeNaD
  • 18,172
  • 8
  • 47
  • 89
0

Unfortunately you may have to create your own RESTful API to answer MongoDB requests until AWS comes up with one. So far they only have what you need for their own Dynamo DB.

Marin
  • 197
  • 5
0

The short answer is yes, you need to create a new connection AND close it before the lambda finishes.

The long answer is actually during my tests you can pass down your DB connections in your handler like so(mysql example as that's what I've got to hand), you can't rely on this having a connection so check my example below, it may be that once your Lambda's haven't been executed for ages it does lose the state from the handler(cold start), I need to do more tests to find out, but I have noticed if a Lambda is getting a lot of traffic using the below example it doesn't create a new connection.

// MySQL.database.js
    import * as mysql from 'mysql'

    export default mysql.createConnection({
        host: 'mysql db instance address',
        user: 'MYSQL_USER',
        password: 'PASSWORD',
        database: 'SOMEDB',
    })

Then in your handler import it and pass it down to the lambda that's being executed.

// handler.js
import MySQL from './MySQL.database.js'

const funcHandler = (func) => {
    return (event, context, callback) => {
        func(event, context, callback, MySQL)
    }
}

const handler = {
    someHandler: funcHandler(someHandler),
}

export default handler

Now in your Lambda you do...

export default (event, context, callback, MySQL) => {
  context.callbackWaitsForEmptyEventLoop = false
  // Check if their is a MySQL connection if not, then open one.

 // Do ya thing, query away etc etc

  callback(null, responder.success()) 


}

The responder example can he found here. sorry it's ES5 because that's where the question was asked.

Hope this helps!

Mrk Fldig
  • 4,244
  • 5
  • 33
  • 64
0

We tested an AWS Lambda that connected every minute to our self managed MongoDB.

The connections were unstable and the Lambda failed

We resolved the issue by wrapping the MongoDB with Nginx reverse proxy stream module:

How to setup MongoDB behind Nginx Reverse Proxy

stream {
    server {
        listen  <your incoming Mongo TCP port>;
        proxy_connect_timeout 1s;
        proxy_timeout 3s;
        proxy_pass    stream_mongo_backend;
    }

    upstream stream_mongo_backend {
      server <localhost:your local Mongo TCP port>;
  }
}
AAber
  • 1,562
  • 10
  • 14
-4

In addition to saving the connection for reuse, increase the memory allocation for the lambda function. AWS allocates CPU proportionally to the memory allocation and when changing from 128MB to 1.5Gb the connection time dropped from 4s to 0.5s when connecting to mongodb atlas.

Read more here: https://aws.amazon.com/lambda/faqs/

grbandr
  • 13
  • 3
  • 1
    Crunching your numbers. Your performance increases by 8 when increasing the memory time by 12. As Lambda is paid by the Gigabyte/second this would actually increase the cost by 50%. – Udo Held Apr 09 '17 at 10:49
  • I guess it depends on the use case if its worth it or not and there might be more optimal memory/cpu allocations, – grbandr Aug 14 '17 at 22:09
-8

I was facing the same issue few times ago but I have resolved with by putting my mongo on same account of EC2. I have created a mongo DB on the same AWS EC2 account where my lambda function reside.

Now I can access my mongo from the lambda function with the private IP.