Thread.sleep() taking longer than expected?

Question

We have a Java client/server RMI application that uses a Persistence Framework. Upon starting a client session we start the following thread:

 Thread checkInThread = new Thread() {
    public void run() {
      while(true) {
        try {
          getServer().checkIn(userId);
        }
        catch(Exception ex) {
          JOptionPane.showMessageDialog(parentFrame, "The connection to the server was lost.");
          ex.printStackTrace();
        }
        try {
          Thread.sleep(15000);
        }
        catch(InterruptedException e) {
        }
      }
    }
  };

This is used to keep track of whether a client session loses connection to the server. If the client does not check in for 45 seconds, then there are a number of things we need to clean up from that client's session. Upon their next check in after they've gone beyond the 45 seconds threshold we boot them from the system which then allows them to log back in. In theory the only time this should happen is if the client PC loses connectivity to the server.

However, we have come across scenarios where the thread runs just fine and checks in every 15 seconds and then for an unknown reason, the thread will just go out to lunch for 45+ seconds. Eventually the client will check back in, but it seems like something is blocking the execution of the thread during that time. We have experienced this using both Swing and JavaFX on the client side. The client/server are only compatible with Windows OS.

Is there an easy way to figure out what is causing this to happen, or a better approach to take to make sure the check ins occur regularly at 15 second intervals assuming their is connectivity between client and server?

Could it be the part before the sleep that is blocking, e.g. the dialog is displayed and waiting for user interaction? — Thomas, Feb 11 '16 at 16:33
Btw, I hope that empty catch block is only here for simplicity reasons and not empty in reality. — Thomas, Feb 11 '16 at 16:35
As @HovercraftFullOfEels said, log. Specifically, log the time the thread calls `sleep` and the time `sleep` returns. That will determine if the delay is in `sleep` or, as is more likely, elsewhere. — David Schwartz, Feb 11 '16 at 16:39
We have some logging in place which I removed to make the example a bit cleaner. But it definitely cannot hurt to add some more. — Tommo, Feb 11 '16 at 17:19
I disagree with the duplicate. Answers on the other question say one can expect sleep to take longer than the given number of ms on a non-real-time JVM and OS. But, 45 or more seconds late? I think something else is going on here. Can't say what without more information---the example doesn't show how the time was measured---but Huy Nguyen Ngoc's answer seems plausible: The OP might be mistakenly attributing time spent in `getServer().checkIn(userId);` to the sleep() call. — Solomon Slow, Feb 11 '16 at 18:01
You should absolutely and positively not do this. There is no such thing as a connection in RMI, *ergo* you are testing for a condtion that does not exist. You are also interfering with RMi's connection pooling. The correct way to accomplish what you're attempting is via the remote session pattern and the `Unreferenced` interface. — user207421, Feb 11 '16 at 20:20
It's more of a simulated check in process. The server maintains a map of the client's ID along with the last time they checked in. If a client fails to check in after 45 seconds, then the server removes them from the map. The next time the client checks in if they are no longer in the map that is when their session gets cleaned up and they get kicked out of the system. We had this issue in the past, however we used to have another step where after 45 seconds the server would reach out and determine the client was still connected. We went through the process of removing all ... — Tommo, Feb 11 '16 at 21:35
Server to Client communication, so this last step is not possible. We were never able to determine why the client would not check in for 45 seconds, however when the server reached out to the client it determined the client was still connected. — Tommo, Feb 11 '16 at 21:38
I understand the requirement, and you're still implementing it incorrectly. RMI can already tell you pretty infallibly when a client loses connectivity, without all this overhead. 'Still connected' has no meaning in RMI'. You should not continue with this. — user207421, Feb 11 '16 at 23:34
Could you please elaborate on you solution and possibly post an answer so I could give you credit? — Tommo, Feb 12 '16 at 00:38

Huy Nguyen Ngoc · Answer 1 · 2016-02-11T17:03:35.193

0

getServer().checkIn(userId);

getServer or checkIn functions may take more than 15 seconds to return, then for that reason

the thread will just go out to lunch for 45+ seconds.

edited Feb 11 '16 at 17:03

answered Feb 11 '16 at 16:38

Huy Nguyen Ngoc

26
2

All that method does is update a HashMap contained on the server. `userCheckInMap.put(userId, new Time());` – Tommo Feb 11 '16 at 16:41
Are you sure it is not having any network activities? Then why do you catch JOptionPane.showMessageDialog(parentFrame, "The connection to the server was lost."); The getServer may be the issue, not checkIn function – Huy Nguyen Ngoc Feb 11 '16 at 16:46
1

By the way, it is not a duplicated question with http://stackoverflow.com/questions/6095712/thread-sleep-waits-more-than-expected. Threed.sleep is not accurate on the order of milliseconds because it depends on thread priorities and implementation details such as timer resolutions, but it should never jump from 15 to 45+ seconds. – Huy Nguyen Ngoc Feb 11 '16 at 17:01
Yes @Huy Nguyen Ngoc thank you for that. I just took a look at the thread and it's not really an issue of needing it to be Real-Time. We don't care about milliseconds, but expect the thread to be executing at least once within a 45 second span. – Tommo Feb 11 '16 at 17:04
getServer() is just returning a class field. If the getServer().checkIn() call fails or throws an exception, then we know there is an actual connection problem. That does not seem to be the issue we are trying to figure out. Based on some thread dumps we've done, the thread is in a TIMED_WAITING state, and it's just unclear as to why it hasn't woken up at least once within that 45 seconds interval. – Tommo Feb 11 '16 at 17:07
Establishing a connection may take more than 45 second if the network is unreliable or server is not available. And while establishing a connection, the thread may be in TIMED_WAITING state also. – Huy Nguyen Ngoc Feb 11 '16 at 17:17
1

Establishing a connection will fail within milliseconds if the server is not available, and if the server wasn't available how did it detect the timeout? – user207421 Feb 11 '16 at 20:29

score 0 · Answer 2 · answered Feb 11 '16 at 20:39

This can happen when the client machine goes into sleep or hibernate mode. Usually when it's a laptop that just had its cover closed.

There can also be temporary network outages that last for >15 seconds, but allow connections to resume automatically when the network comes back. In this case, the client can be stuck in .checkIn(), not sleep()

score 0 · Answer 3 · edited May 23 '17 at 12:15

You should absolutely and positively not do this. There is no such thing as a connection in RMI, ergo you are testing for a condtion that does not exist. You are also interfering with RMI's connection pooling. The correct way to accomplish what you're attempting is via the remote session pattern and the Unreferenced interface. RMI can already tell you when a client loses connectivity, without all this overhead. 'Still connected' has no meaning in RMI'.

Thread.sleep() taking longer than expected?

3 Answers3