4

I have a problem with my RMI application. After about 20 hours (+/- a few hrs) clients can no longer connect. In the first 20 hours of the server's lifetime though I can make as many connections as I want. I suspected a problem with the RMI remote object being garbacge collected as there are no references pointing to it, but I can rule that out for 2 reasons:

  1. I forced the JVM running the server to do a GC using jconsole and clients can still connect
  2. I hold a reference to my server in the main method, which I do not exit and the RMI registry and the stub are members of my server class.

My server creates an RMI registry on port 1099 and gets exported as a UnicastRemoteObject on port 5099. When clients can no longer connect after 20 hours I get a java.rmi.ConnectException. To be clear the server's java process is still running and the registry (running within that process) still responding and returning a remote object. The exception is thrown when I call a remote method on the client side.

If I do "netstat -tulpn" on my server machine I can see that the java process is listening on port 5099 initially, but once that 20 hour bug kicks in the server is no longer listening on that port. I think I can rule out firewall issues as well, as I have disabled the server firewall for testing. Below is a simplified version of my code. Any ideas of what's going on there and how to make the server live indefintely would be much appreciated. Cheers!

import java.rmi.RemoteException;
import java.rmi.registry.LocateRegistry;
import java.rmi.registry.Registry;
import java.rmi.server.UnicastRemoteObject;

public class MyRMIServer implements MyRMIInterface {

private MyRMIInterface stub;

private Registry registry;


public static void main(String[] args) 
{
    MyRMIServer server = new MyRMIServer(); 
    server.startRmiServices();

    // now sleep, don't let the main thread die, otherwise we might loose our ref to the 

    // RMI stub and/or registry
    while (true) 
    {
        try {
            Thread.sleep(5000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

}

private void startRmiServices()
{

    try {
        // set up security manager
        if (System.getSecurityManager() == null) {
            System.setSecurityManager(new SecurityManager());
        }

        // create stub
        stub = (MyRMIInterface) UnicastRemoteObject.exportObject(this,5099);

        // Bind the remote object's stub in the registry
        registry = LocateRegistry.createRegistry(1099);
        registry.rebind("MyServer", stub);

        System.out.println("RMI ready");

    } catch (RemoteException e) {
        e.printStackTrace();
    } 
}

@Override
public synchronized int remoteCall(int x) throws RemoteException
{
    return x+1;
}



}
xlecoustillier
  • 16,183
  • 14
  • 60
  • 85
themik81
  • 401
  • 6
  • 17
  • Define "can no longer connect". What exception is thrown? – user207421 Jul 22 '13 at 09:20
  • Any errors on the server? Show us your logs. – André Stannek Jul 22 '13 at 09:34
  • @stonedsquirrel At this stage the errors at the clients are infinitely more interesting. – user207421 Jul 22 '13 at 09:37
  • @MarkoTopolnik Calm down. We don't know what the bug is yet, so you don't have any reason to asset that it 'cannot possibly have anything to do with the object being garbage-collected'. The loop is not voodoo programming, it is a way of avoiding local GC. Not the best way, but a way. A better way would be to make both `registry` and `stub` static. – user207421 Jul 22 '13 at 09:41
  • @EJP If OP can't see the server using netstat after it becomes unreachable it is defintely not a client problem. The client errors don't add usefull information. – André Stannek Jul 22 '13 at 09:42
  • @stonedsquirrel We don't know what 'unreachable' *means* yet. You're the first person in the thread to use that term. The client errors would tell us *why*. Specifically, if they are getting 'NoSuchObjectException', it would indicate DGC kicking in, and there are no 'errors at the server' that will tell us about that. – user207421 Jul 22 '13 at 09:44
  • Sorry for my english, i only asked if the error appears when the day change, for any reazon of restart server of something with the firewall. – Distopic Jul 22 '13 at 09:50
  • @EJB Of course we know what 'unreachable' means. The server stops listening to the port 5099 which means the listening thread dies for some reason or it is an evironment error. No matter what, the problem will always be the same from the clients view: He can't connect to port 5099. – André Stannek Jul 22 '13 at 10:07
  • @stonedsquirrel You may be prepared to guess about this in the absence of evidence. I'm not. The problem at the client could be any of `java.rmi.ConnectException`, `java.rmi.NoSuchObjectException`, ... and it could happen at either the `lookup()` call or the remote method call. All these things make a difference. The problem at the server could be DGC, which leaves no trace, or a time-based firewall rule, ditto, or the JVM exiting, possibly ditto. In not a single one of these cases is asking for the server error of any use whatsoever. – user207421 Jul 22 '13 at 10:12
  • 1
    I don't have a client log here for reference right now and need to wait another 20 hours for a new one, but I think the exception on the client is a java.rmi.ConnectException caused by a java.net.ConnectException. There is no doubt in my mind that the client exception is caused by the server no longer listening on port 5099. The question is why does the server stop listening. By the way the server stops listening after around 20 hrs even if no client ever connected to it (at least according to netstat). And nope, no errors at all in the server's RMI errorlog. – themik81 Jul 22 '13 at 10:41
  • @themik81 So try what I said in my answer: make the references static and lose the loop. Also make sure that you log any exit from the JVM. – user207421 Jul 22 '13 at 10:43
  • @EJP Ok, I will try that (willing to try anything at this point) and agree that it is probably better style, but why do you think it would make a difference whether the references are static or not? I do hold references now already. Or do you think there is some GC optimization magic going on that lets my local server object reference go out of scope before the main method exits? – themik81 Jul 22 '13 at 10:48
  • I think the JVM is probably exiting actually. – user207421 Jul 22 '13 at 10:50
  • @EJP But after the 20 hours I can see the java process still running and listening on port 1099. Just port 5099 stops listening. – themik81 Jul 22 '13 at 10:52
  • 1
    In that case the remote object is being GC'd, and making the reference to it static will definitely stop that. – user207421 Jul 22 '13 at 10:57
  • @EJP OK, done that now and server restarted, will unfortunately only be able to find out tomorrow if that worked or not, will report back. – themik81 Jul 22 '13 at 11:02
  • 1
    themik81, there's a JIT compiler optimization which can clear a local variable immediately after the point where it is last read. The JIT compiler usually acts after 10,000 iterations over a piece of code. So let's see... 10,000*5 seconds... that's 14 hours, not that far off. – Marko Topolnik Jul 22 '13 at 12:06
  • @MarkoTopolnik Ah, that's really interesting, this could very well be it then. Do you have a link for that by any chance? Btw, the 20 hours is not an accurate measure, I know it's definitely > 12 and < 24. – themik81 Jul 22 '13 at 12:44
  • 1
    [Here it is](http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html). Search for "CompileThreshold". – Marko Topolnik Jul 22 '13 at 12:55
  • This was it. Thanks EJP and MarkoTopolnik. Now with static references to my server object, more than 24 hours later clients can still connect fine to my server. – themik81 Jul 23 '13 at 13:51

1 Answers1

1

The loop in main() isn't the best way to prevent your objects being DGC'd and GC'd. Just make stub and registry static.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • @MarkoTopolnik You are quite mistaken. The listening thread does not prevent garbage collection. Otherwise DGC and GC could never take place, and they do. – user207421 Jul 22 '13 at 09:50
  • @MarkoTopolnik (1) Where in the specification does it say that the listening thread holds a hard reference to the object? (Where in the specification does it even talk about a listening thread?) How could DGC possibly work if what you say is true? (2) DGC is about distributed garbage-collection of exported remote objects. It isn't about anything *else* actually: there *is* nothing else for it to be about. – user207421 Jul 22 '13 at 09:57
  • @MarkoTopolnik No it isn't. The object listening at the port is of type `sun.rmi.transport.tcp.TCPTransport$AcceptLoop.` Exported remote objects can share ports. For example, the OP's code could have used port 1099 for both his server and the Registry. Try it. If what you said was true that wouldn't be possible. The listening thread is started on the first use of the port by an exported remote object, and exits when the last remote object using that port has been unexported. You seem to be just guessing. Please have a look at the source code before you continue this exchange. – user207421 Jul 22 '13 at 10:01
  • @MarkoTopolnik I don't know where you got that from. It isn't in the RMI Specification AFAIK. There is nothing there about sockets, let alone listening threads. The Specification is transport-agnostic. The client certainly does *not* 'obtains details of connecting to the server from the registry'. It gets that from the stub. The stub may or may not come from the Registry. *When you have tried* the experiment I suggested in my last comment you will be competent to debate this with me. At the moment you aren't, sorry. – user207421 Jul 22 '13 at 10:15
  • @MarkoTopolnik Further to this, as it didn't fit, the document you have quoted without proper citation is an IBM *implementation* document, although curiously it appears to describe the Sun implementation rather than IBM's. In any case it doesn't contradict what I've said here, and it doesn't support your assertions about who listens, when DGC happens, etc. I also draw your attention to ['http://docs.oracle.com/javase/7/docs/platform/rmi/spec/rmi-arch4.html'](http://docs.oracle.com/javase/7/docs/platform/rmi/spec/rmi-arch4.html), which *is* in the specification. – user207421 Jul 22 '13 at 10:27
  • @MarkoTopolnik You could also try to explain to yourself how 'it is that server object's method which is blocking inside `Socket#accept`' when RMI exported remote objects aren't constrained to extend a particular class, and when the `UnicastRemoteObject` class that they do commonly extend doesn't have such a method, or extend `Runnable` either. – user207421 Jul 22 '13 at 10:30
  • @MarkoTopolnik I agree. You have to keep a reference to the Registry, otherwise it can be DGC'd and GC'd, as my answer says. That's further evidence *against* your now-deleted theory, actually. If there was a thread holding it alive, the quote you provided wouldn't be true. – user207421 Jul 22 '13 at 10:35
  • @MarkoTopolnik I've already provided a link to the same statement. Your point? – user207421 Jul 22 '13 at 10:39
  • @MarkoTopolnik Good to see that you now agree with the RMI specification, or at least that you've actually read it. – user207421 Jul 22 '13 at 10:44
  • When you say 'your point', do you mean your numerous comments which you have now deleted, the relics of which can only now be discerned in my subsequent comments in refutation? Or do you mean the numerous extracts you've quoted from the RMI Specification that don't disagree with me? – user207421 Jul 22 '13 at 10:55
  • "Yields no results" --- [here it is](http://stackoverflow.com/a/14327101/1103872). Quote 1: "It just sounds like you had a bad first experience due to inexperience, but that isn't RMI's fault either." Quote 2: "it also isn't my experience or habit to blame my own deployment errors on the technology." – Marko Topolnik Jul 22 '13 at 11:19
  • 2
    If you've got an answer that contradicts this answer, please post it. As it stands, we have this long discussion in the comments that is in danger of getting wiped due to flags. I'm sure there's information you want to keep around -- it's best to either put it into an answer, or update an existing answer/question with the comment. – George Stocker Jul 22 '13 at 12:30