Java RMI calls fail for no reason once in a few hundred

Question

I wrote test RMI server and client programs. In the server there is one method which is exposed to the client.

On the client, I am using a 600-thread executor service to call the RMI method 6000 times.

On the server, each method call will create a simple task and submit it to a 300-thread executor service.

I get exceptions just once or twice every execution. So, for 6000 calls, I get about 1 to 3 exceptions. Also, these exceptions seem to happen only during initial ramp up period.

java.rmi.ConnectIOException: Exception creating connection to: ; nested exception is: 
java.net.SocketException: Connection reset by peer
at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:631)
at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:129)
at java.rmi.server.RemoteObjectInvocationHandler.invokeRemoteMethod(RemoteObjectInvocationHandler.java:194)
at java.rmi.server.RemoteObjectInvocationHandler.invoke(RemoteObjectInvocationHandler.java:148)
at com.sun.proxy.$Proxy0.receiveMessage(Unknown Source)
at com.example.rmi.MsgTask.run(MsgTask.java:18)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Client and server are running on same machine, same Eclipse IDE.

Looks like if rmi server program is busy for a few milliseconds while a request is received, the request can get dropped. Is this ok? Should I take this behaviour to be normal and build in a 'retry' approach in my RMI clients in future? Or, can I change some settings to make sure that requests are not dropped?

Looks like every call received in the RMI server-side object is on a separate thread. So, there is no need to use manual threading on the server-side. Although, there seems to be no upper limit on number of threads if left to 'auto'. — Teddy, Nov 04 '15 at 09:32
It's not that simple. The RMI Specification is deliberately obscure on the point, but the one thing you can take away from what it says is that you cannot assume the RMI server is single-threaded. What you really get in the Sun/Oracle JDK is a thread per connection. — user207421, Sep 07 '20 at 06:00

score 3 · Accepted Answer · answered Nov 04 '15 at 07:42

3

You're running into the TCP listen backlog. When it fills up, a Windows host will issue a 'connection reset'.

The solution is to either reduce your load or introduce retries, after a small but increasing sleep interval.

answered Nov 04 '15 at 07:42

user207421

305,947
44
307
483

Thanks a lot for pointing in this direction. Im running this on MacBook Pro.. hope your answer still applies. Any rough numbers regarding at how many KB of backlog it may fail... – Teddy Nov 04 '15 at 07:57
The backlog is counted in connections. It doesn't have anything to do with kilobytes. It can be anything between 5 and 500 or more, and there is no way to discover what it actually is without deep-diving into the kernel. – user207421 Nov 04 '15 at 07:58
It's just as you mentioned... I googled 'TCP listen backlog' and found http://stackoverflow.com/questions/114874/socket-listen-backlog-parameter-how-to-determine-this-value and http://tangentsoft.net/wskfaq/advanced.html#backlog – Teddy Nov 04 '15 at 08:08

score 2 · Answer 2 · answered Nov 04 '15 at 07:30

2

Error handling is always a good idea especially on network calls. Regardless of whether you actually having issues or not. So yes I would handle this and provide a retry.

By the way 600 threads seems a bit too much to me. Reducing them might fix your trouble.

answered Nov 04 '15 at 07:30

Kai

38,985
14
88
103

1

I agree that he should try reducing the thread count. Does the problem still occur with 500 threads? 100? 10? – DavidS Nov 04 '15 at 07:41

Java RMI calls fail for no reason once in a few hundred

2 Answers2