We are trying to connect to AWS DocumentDB in a lambda which is built on Express in Serverless. To do this we're using mongoose and a connection function that looks something like
import mongoose from 'mongoose';
import logger from './utils/logger';
import fs from 'fs';
const READYSTATE_CONNECTED = 1;
const mongoDB = process.env.MONGODB_URI;
const certificateFilePath = __dirname + '/rds-combined-ca-bundle.pem';
logger.info(`Loading certificate file from ${certificateFilePath}`);
let ca = [fs.readFileSync(certificateFilePath)];
logger.info('Connection is ' + mongoose.connection.readyState);
if (mongoose.connection.readyState !== READYSTATE_CONNECTED) {
logger.info(`Connecting to mongo using env connection string ${mongoDB}`);
mongoose.connect(mongoDB, { useNewUrlParser: true, useUnifiedTopology: true, checkServerIdentity: false, ssl: true, sslCA: ca }).catch((err) => {
logger.error(`Unable to connect to mongoose due to ${err.reason}`);
console.error(err);
});
}
mongoose.Promise = global.Promise;
const db = mongoose.connection;
// eslint-disable-next-line no-console
db.on('error', console.error.bind(console, 'MongoDB connection error:'));
export default db;
The idea here being that we maintain a connection and reuse it to avoid the expense of creating new connections for each request that comes into the lambda. For the most part this works fine but every once in a while (perhaps 2x a day) we see problems connecting to the database. It seems to crash the lambda pretty hard and we have to trigger a change on the lambda to trick lambda into restarting our application after which all works fine again for another few hours. We run in 4 identical environments and it seems like the production environment is the only one which experiences this problem. Production is slightly busier than the other environments but really only by 50%.
The error looks like
2020-11-09T20:10:36.565Z d88c9b33-6b84-44cd-8c1d-297c6334aad5 ERROR MongooseServerSelectionError: connection timed out
at NativeConnection.Connection.openUri (/var/task/node_modules/mongoose/lib/connection.js:800:32)
at /var/task/node_modules/mongoose/lib/index.js:342:10
at /var/task/node_modules/mongoose/lib/helpers/promiseOrCallback.js:31:5
at new Promise (<anonymous>)
at promiseOrCallback (/var/task/node_modules/mongoose/lib/helpers/promiseOrCallback.js:30:10)
at Mongoose.connect (/var/task/node_modules/mongoose/lib/index.js:341:10)
at Object.<anonymous> (/var/task/src/mongoose.js:19:24)
at Module._compile (internal/modules/cjs/loader.js:1137:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1157:10)
at Module.load (internal/modules/cjs/loader.js:985:32)
at Function.Module._load (internal/modules/cjs/loader.js:878:14)
at Module.require (internal/modules/cjs/loader.js:1025:19)
at require (internal/modules/cjs/helpers.js:72:18)
at Object.<anonymous> (/var/task/src/AppBuilder.js:17:1)
at Module._compile (internal/modules/cjs/loader.js:1137:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1157:10) {
reason: TopologyDescription {
type: 'ReplicaSetNoPrimary',
setName: 'rs0',
maxSetVersion: null,
maxElectionId: null,
servers: Map {
'documentdbmasterinstance-xxxx.xxx.us-east-1.docdb.amazonaws.com:27017' => [ServerDescription],
'documentdbreplica1instance-xxxx.xxxx.us-east-1.docdb.amazonaws.com:27017' => [ServerDescription],
'documentdbreplica2instance-xxxx.xxxx.us-east-1.docdb.amazonaws.com:27017' => [ServerDescription]
},
stale: false,
compatible: true,
compatibilityError: null,
logicalSessionTimeoutMinutes: null,
heartbeatFrequencyMS: 10000,
localThresholdMS: 15,
commonWireVersion: 6
}
Thus far we've been unable to pinpoint any particular action which causes this. There does look to be a slight increase in connections to the database at the time but only to about 75 connections and we're running on r5.large which should allow 1700 connections so we're well off that limit.
I was unsure if the mention of ReplicaSetNoPrimary
in the error log is a red herring but it doesn't seem to mentioned anywhere in similar issue reports. I am suspicious about if the connection is really timing out. None of the lambda invocations take more than 200ms.
I suppose the questions are:
- Is there anything obvious in the connection code which would cause this?
- Is there a better, more canonical way to establish and maintain connections in this express application turned lambda?
- Is the
ReplicaSetNoPrimary
indicative that there is some issue with the documentdb electing a new primary or the primary being unreachable? - Any suggestions for more logging I could add to chase this down?
Edit: Our connection strings look like
mongodb://redacted:redacted@prod-db.cluster-cvgzkbo26lzb.us-east-1.docdb.amazonaws.com:27017/database?ssl=true&ssl_ca_certs=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred&retryWrites=false