I have an ASP MVC app using the 10gen Mongo C# driver (github) to connect to a database server on a specific port. I have this deployed in an IIS 7.0 web garden with 3 worker processes. Every few minutes, under load, the following exception is thrown and a 500 is returned to the user:
Only one usage of each socket address (protocol/network address/port) is normally permitted <database-ip>:<database-port>
Stack Trace:
at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
at System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP)
at MongoDB.Driver.Internal.MongoConnection.Open() in C:\work\10gen\mongodb\mongo-csharp-driver\Driver\Internal\MongoConnection.cs:line 266
at MongoDB.Driver.Internal.MongoConnection.GetNetworkStream() in C:\work\10gen\mongodb\mongo-csharp-driver\Driver\Internal\MongoConnection.cs:line 409
at MongoDB.Driver.Internal.MongoConnection.SendMessage(MongoRequestMessage message, SafeMode safeMode) in C:\work\10gen\mongodb\mongo-csharp-driver\Driver\Internal\MongoConnection.cs:line 377
at MongoDB.Driver.Internal.MongoConnection.RunCommand(String collectionName, QueryFlags queryFlags, CommandDocument command) in C:\work\10gen\mongodb\mongo-csharp-driver\Driver\Internal\MongoConnection.cs:line 296
at MongoDB.Driver.Internal.MongoConnection.Authenticate(String databaseName, MongoCredentials credentials) in C:\work\10gen\mongodb\mongo-csharp-driver\Driver\Internal\MongoConnection.cs:line 98
at MongoDB.Driver.Internal.MongoConnection.CheckAuthentication(MongoDatabase database) in C:\work\10gen\mongodb\mongo-csharp-driver\Driver\Internal\MongoConnection.cs:line 195
at MongoDB.Driver.MongoServerInstance.AcquireConnection(MongoDatabase database) in C:\work\10gen\mongodb\mongo-csharp-driver\Driver\Core\MongoServerInstance.cs:line 185
at MongoDB.Driver.MongoServer.AcquireConnection(MongoDatabase database, Boolean slaveOk) in C:\work\10gen\mongodb\mongo-csharp-driver\Driver\Core\MongoServer.cs:line 893
at MongoDB.Driver.MongoCursorEnumerator`1.AcquireConnection() in C:\work\10gen\mongodb\mongo-csharp-driver\Driver\Core\MongoCursorEnumerator.cs:line 184
at MongoDB.Driver.MongoCursorEnumerator`1.GetFirst() in C:\work\10gen\mongodb\mongo-csharp-driver\Driver\Core\MongoCursorEnumerator.cs:line 194
at MongoDB.Driver.MongoCursorEnumerator`1.MoveNext() in C:\work\10gen\mongodb\mongo-csharp-driver\Driver\Core\MongoCursorEnumerator.cs:line 126
at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
at MongoDB.Driver.MongoCollection.FindOneAs[TDocument](IMongoQuery query) in C:\work\10gen\mongodb\mongo-csharp-driver\Driver\Core\MongoCollection.cs:line 493
at MongoDB.Driver.MongoCollection.FindOneByIdAs[TDocument](BsonValue id) in C:\work\10gen\mongodb\mongo-csharp-driver\Driver\Core\MongoCollection.cs:line 529
at MongoDB.Driver.MongoCollection`1.FindOneById(BsonValue id) in C:\work\10gen\mongodb\mongo-csharp-driver\Driver\Core\MongoCollection.cs:line 1462
I've spent a lot of time Googling this exception and had almost convinced myself that I was running out sockets, as described in WCF: System.Net.SocketException - Only one usage of each socket address (protocol/network address/port) is normally permitted
but the issue was unaffected by my increasing the maximum number of available sockets and decreasing the TIME_WAIT value.
It recently occurred to me that even though the Mongo driver does connection pooling, each worker process has it's own pool. These pools are either working together to run the system out of sockets or are just trying to use the same socket address simultaneously. The latter condition seems more likely, since it's exactly what the exception suggests and the problem doesn't happen when the site is deployed to a single process. But outside the web garden the performance reduction is very noticeable and if the process crashes, the recycling delay is unacceptable. That said, users getting 500 responses is equally unacceptable.
I have considered the following:
- Wrap all data access calls in a try/retry-n-times/fail construct (ugly and just plain bad practice)
- Transition to multi-machine instead of multi-process (expensive and unnecessary at current user loads)
Questions:
- Is there any way have these worker processes gracefully share the address/port?
- Is there a way to do some sort of 'port-triggering' (not the right term, I'm sure), where the outgoing connection of each worker process could use a different local port to connect to the same remote port? The remote machine can handle multiple incoming connections, on the same port, without issue.
- If you think I'm running out of sockets, is there a way to limit the number of socket connections for each process such that they can share the 65536 sockets.