When trying to establish a largeish number of TCP connections in parallel I observe some weird behavior I consider a potential bug in gen_tcp
.
The scenario is a server listening on a port with multiple concurrent acceptors. From a client I establish a connection by calling gen_tcp:connect/3
, afterwards I send a "Ping" message to the server and wait in passive mode for a "Pong" response. When performing the 'get_tcp:connect/3' calls sequentially all works fine, including for large number of connections (I tested up to ~ 28000).
The problem occurs when trying to establish a lot of connections in parallel (depending on the machine between ~75 and several hundred). While most of the connections still get established, some connections fail with a closed
error in gen_tcp:recv/3
. The weird thing is, that these connections did not fail before, the calls to gen_tcp:connect/3
and gen_tcp:send/2
were both successful
(i.e. returned ok
). On the server side I don't see a matching connection for these "weird" connections, i.e. no returning gen_tcp:accept/1
. It is my understanding, that a successful 'get_tcp:connect/3' should result in a matching accepted connection at the server side.
I already filed a bug report, there you can find a more detailed description and a minimal code example to demonstrate the problem. I was able to reproduce the problem on Linux and Mac OS X and with different Erlang versions.
My questions here are:
- Is anyone able to reproduce the problem and can confirm, that this is erroneous behavior?
- Any ideas for a workaround? How to deal with this problem, other starting all the connections sequentially (which takes forever)?