18

I've a Java application that performs some aggregations on MongoDB, but sometimes it just hangs and throw a SocketTimeout exception. After the Exception the app will run just fine (for a bit, then it will probably raise again the exception).

I've just found this explanation that seems a possible cause but I'm not sure.

I initalize the MongoClient and keep the connection to the DB open. I'm not sure if this could be a problem and I should just get everytime the database and then let the database to be garbage collected (and close the connection).

Another approach could be ping periodically Mongo to keep the connection pool "fresh".

The client used is something like this:

public class DbClient {

    private static MongoClient mongoClient;
    private static MongoDatabase db;

    private DbClient() {}

    public static void init() throws Exception {
        mongoClient = new MongoClient();
    }

    public static MongoDatabase getDB() {
        if(mongoClient == null)
            throw new IllegalStateException("Client not initialized!");

        if(db == null) {
            db = mongoClient.getDatabase("my_db");
        }
        return db;
    }
}

Is this the possible cause of the SocketTimeout?

This is the exception thrown:

09:20:45.742 [qtp605535417-46] INFO  org.mongodb.driver.connection - Closed connection [connectionId{localValue:16, serverValue:6562}] to myapp.com:27017 because there was a socket exception raised by this connection.
09:20:45.743 [qtp605535417-46] ERROR myapp.service.Api - Error processing request
com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message
    at com.mongodb.connection.InternalStreamConnection.translateReadException(InternalStreamConnection.java:474) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:225) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.UsageTrackingInternalConnection.receiveMessage(UsageTrackingInternalConnection.java:102) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.DefaultConnectionPool$PooledConnection.receiveMessage(DefaultConnectionPool.java:435) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.CommandProtocol.execute(CommandProtocol.java:112) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:159) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:286) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.DefaultServerConnection.command(DefaultServerConnection.java:173) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:215) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:206) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.CommandOperationHelper.executeWrappedCommandProtocol(CommandOperationHelper.java:112) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.FindOperation$1.call(FindOperation.java:487) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.FindOperation$1.call(FindOperation.java:482) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:239) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:212) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.FindOperation.execute(FindOperation.java:482) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.operation.FindOperation.execute(FindOperation.java:79) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.Mongo.execute(Mongo.java:772) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.Mongo$2.execute(Mongo.java:759) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.OperationIterable.iterator(OperationIterable.java:47) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.FindIterableImpl.iterator(FindIterableImpl.java:143) ~[mongo-java-driver-3.2.2.jar!/:na]
    at myapp.common.db.service.dao.AnalysisMongoImpl.getAnalysis(AnalysisMongoImpl.java:66) ~[common-0.2.0-SNAPSHOT.jar!/:na]
    at myapp.common.db.service.AnalysisServiceImpl.getAnalysis(AnalysisServiceImpl.java:31) ~[common-0.2.0-SNAPSHOT.jar!/:na]
    at myapp.aggregator.service.Api$1.handle(Api.java:88) ~[aggregator-0.2.0-SNAPSHOT.jar!/:na]
    at spark.webserver.MatcherFilter.doFilter(MatcherFilter.java:139) [spark-core-1.1.1.jar!/:na]
    at spark.webserver.JettyHandler.doHandle(JettyHandler.java:54) [spark-core-1.1.1.jar!/:na]
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:179) [jetty-server-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) [jetty-server-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.server.Server.handle(Server.java:451) [jetty-server-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.server.HttpChannel.run(HttpChannel.java:252) [jetty-server-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:266) [jetty-server-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:240) [jetty-io-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596) [jetty-util-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527) [jetty-util-9.0.2.v20130417.jar!/:9.0.2.v20130417]
    at java.lang.Thread.run(Thread.java:745) [na:1.7.0_95]
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method) ~[na:1.7.0_95]
    at java.net.SocketInputStream.read(SocketInputStream.java:152) ~[na:1.7.0_95]
    at java.net.SocketInputStream.read(SocketInputStream.java:122) ~[na:1.7.0_95]
    at com.mongodb.connection.SocketStream.read(SocketStream.java:85) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.InternalStreamConnection.receiveResponseBuffers(InternalStreamConnection.java:491) ~[mongo-java-driver-3.2.2.jar!/:na]
    at com.mongodb.connection.InternalStreamConnection.receiveMessage(InternalStreamConnection.java:221) ~[mongo-java-driver-3.2.2.jar!/:na]
    ... 34 common frames omitted
Community
  • 1
  • 1
Enrichman
  • 11,157
  • 11
  • 67
  • 101
  • You need to post the exception, its message, and the stack trace. In your question. None of those member fields or methods should be static. – user207421 Feb 24 '16 at 09:20
  • @EJP hm, why? I'm guessing that the `getDB()` could be the issue, but as I've read I should handle the MongoClient as a Singleton. – Enrichman Feb 24 '16 at 09:28
  • Client != Cursor && Client != Connection. Actually, the client maintains a connection pool. Using these connection, operations are executed on the server, like queries. In this case, a cursor is returned. In case the operation takes too long, a timeout is triggered and the connection closed server side. Which, from what I can see, is what happened. – Markus W Mahlberg Feb 24 '16 at 09:31
  • @MarkusWMahlberg I don't think this is the case. I've a query timeout, and these are thrown also for not heavy aggregations. In case of long running queries the exception thrown is the correct one. `collection.aggregate(pipeline).maxTime(maxTimeout, TimeUnit.SECONDS);` and from the logs I can see that the exception is raised BEFORE the aggregation. – Enrichman Feb 24 '16 at 09:35
  • Well, I'd double-check the assumption that the aggregations should be fast enough by running them on the shell. A bad index combined with high load can wreak havoc on performance, where without load it was ok(ish). – Markus W Mahlberg Feb 24 '16 at 09:39
  • @MarkusWMahlberg I've edited the previous comment. Just pointing out that the aggregation query is not even executed when the SocketTimeout is raised. Furthermore the aggregation is done directly on the `_id` field without any load on the server. – Enrichman Feb 24 '16 at 09:44
  • Well, that's not what I see from the stack trace, however you are surely correct. Good luck! ;) – Markus W Mahlberg Feb 24 '16 at 09:47
  • @MarkusWMahlberg hm, I see what you're trying to say. Actually yes, it's raised before the aggregation but during another query. But that one is a simple find done by `_id`, so I'm still lost. :( (sorry for the tone, bit tired of investigating this) – Enrichman Feb 24 '16 at 09:53
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/104383/discussion-between-markus-w-mahlberg-and-enrichman). – Markus W Mahlberg Feb 24 '16 at 09:55

3 Answers3

23

After few tries I've found out that it was a problem with the Azure's Load Balancer.
After 60s of inactivity it will disconnect any pending TCP connection.

After further digging I've found this post of the MongoDB diagnostics FAQ, and I've set the tcp keepalive to 120s:

sudo sysctl -w net.ipv4.tcp_keepalive_time=<value>

and I've also set the socketKeepAlive of the MongoClient to true:

MongoClientOptions.Builder options = MongoClientOptions.builder();
options.socketKeepAlive(true);
mongoClient = new MongoClient(mongoAddress, options.build());

After these fixes the issue seems gone!

Enrichman
  • 11,157
  • 11
  • 67
  • 101
3

If you experience socket errors between clients and servers or between members of a sharded cluster or replica set that do not have other reasonable causes, check the TCP keepalive value (for example, the tcp_keepalive_time value on Linux systems). A common keepalive period is 7200 seconds (2 hours); however, different distributions and macOS may have different settings.

For MongoDB, you will have better results with shorter keepalive periods, on the order of 120 seconds (two minutes).

Where you have installed mongodb you have to simply run this command on Linux

sudo sysctl -w net.ipv4.tcp_keepalive_time=120

Reference: Does TCP keepalive time affect MongoDB Deployments?

xmar
  • 1,729
  • 20
  • 48
  • I already have `net.ipv4.tcp_keepalive_time = 120` on the host and I believe `keep-alive` setting defaults to true now. However I am still getting `org.mongodb.driver.connection : Got socket exception on connection` occasionally. – daniyel Feb 05 '19 at 08:28
  • @NitishDeshpande well I do not remember right now anymore, since we tried many different stuff and I am also not with the company at the time anymore. If I am not mistaken, we did something like re-try logic, were when Azure would drop connection, we would reestablish it. I do not have a snippet, to demonstrate how we solved it. – daniyel Mar 12 '20 at 17:30
  • @NitishDeshpande Were you able to resolve this issue? – Amol Kshirsagar Apr 17 '20 at 04:23
1

I just met the same problem and solved it by increasing the spring.data.mongodb.socketTimeout=<value> in Spring Boot config. By default, the value is 0 and means no timeout. However, my colleague set it to 20000(20 seconds), so the MongoSocketReadTimeoutException will be thrown out as soon as my aggregate() runs longer than 20 seconds!

Zhang Chen
  • 89
  • 3