The "user-time" or "wall-clock-time" spent with the "wait(timeout)" call is, ususally, the timeout value plus the time until the thread was re-scheduled for execution and executed.
See the Javadoc for the Object.wait(long timeout) method:
The thread T is then [...] re-enabled for thread scheduling. It then competes in the usual manner with other threads for the right to synchronize on the object;
So there is no guarantee for "real-time" operation, it's more a kind of "best try", depending on current system load and maybe also on other locking dependencies in your application. Therefore, if the system is under heavy load, or your application handles many threads, the wait might take considerably longer than the timeout.
PS
The quote @nathan-hughes mentioned in his comment to your question is probably the key sentence in the Javadoc of the "wait" method: The specified amount of real time has elapsed, more or less
.
PPS
Based on your question edit with additional context information ('very complex software', 'high traffic', 'huge overwaits'): you have to find all usages of your obj
object as a lock, and determine how those usages interact together.
This can get really complex. Here an attempt to sketch a "simple" scenario of what might go wrong, whith only two plain threads, like e.g. this:
// thread 1
synchronized (obj) {
// wait 1000ms
obj.wait(1000);
}
// check for overwait
// thread 2, after, let's say 500 ms
synchronized (obj) {
obj.notify();
}
Easy scenario, everything is fine, the execution order is roughly:
- 0ms: T1 aquires the lock on 'obj'
- 0ms: T1 registers itself as waiting for 'obj', and gets excluded from thread scheduling. While excluded from thread scheduling, the lock on 'obj' is again released (!)
- 500ms: T2 aquires the lock on 'obj', notifies one thread waiting for notification (thread is chosen based on thread scheduling settings), and releases the lock on 'obj'
- 500ms + X: T1 is re-enabled for thread scheduling, it waits until it re-aquires the lock on 'obj' (!), then it finished it's block and releases the lock on 'obj'.
These are only 2 simple threads and synchronized
blocks. Let's make this more complex, with poorly written code. What if 2nd thread would be something like that:
// bad variant of thread 2, after, let's say 500 ms
synchronized (obj) {
obj.notify();
// do complex operation, taking more than few ms,
// maybe a heavy SQL query/update...
}
In this case, even though T1 has got notified (or maybe timed out), it has to wait until it gains again the lock on 'obj', which is still held by T2 as long as the complex operation runs (step 3 in the previous list)! This might indeed take up to ... seconds or more.
Even more complexity: we return to our initial simple threads T1 and T2, but add a 3rd thread:
// thread 3, after, let's say also 500 ms
synchronized (obj) {
// do complex operation, taking more than few ms,
// maybe a heavy SQL query/update...
}
The execution order could become, roughly:
- 0ms: T1 aquires the lock on 'obj'
- 0ms: T1 registers itself as waiting for 'obj', and gets excluded from thread scheduling. While excluded from thread scheduling, the lock on 'obj' is again released (!)
- 500ms: T2 aquires the lock on 'obj', notifies one thread waiting for notification (thread is chosen based on thread scheduling settings), and releases the lock on 'obj'
- 500ms + X: T2 is re-enabled for thread scheduling, but does not get the lock on 'obj', because
- 500ms + X: T3 is scheduled by thread scheduler before T1, and it aquires the lock on 'obj' (!), and starts doing it's complex operation. T1 can't do anything but wait!
- 500ms + MANY: T3 *releases the lock on 'obj'.
- 500ms + MANY: T1 re-aquires the lock on 'obj' (!), then exits its synchonized block and releases itself the lock on 'obj'.
This is only scratching the surface of what might happen in your 'very complex software', with 'high traffic'. Add more threads, maybe poorly coded (e.g. doing too much in the 'synchronized' blocks), high traffic, and you might easily get the overwaits you mentioned.
OPTIONS
How to solve this... depends on the purpose and complexity of your software, there is no simple plan. More can't be said based on the available information.
Maybe reanalysing the code with pen and paper is enough, maybe profiling it could help you find the locks, maybe you can get the needed information about the current locks via JMX or a thread dump (via signal, jconsole, jcmd, jvisualvm), or by monitoring with the Java Mission Control and Java Flight Recording (features available since ... JDK 7u40 I think).
You've asked in a comment if Thread.sleep(timeout)
would help: can't be said without more info. Maybe it would help. Or maybe reentrant locks, or other locking options (see packages java.util.concurrent, java.util.concurrent.atomic, java.util.concurrent.locks) would be more appropriate. It depends on your code, your use case and on the Java version you're using.
If GC is not an issue (see below), and you have analyzed the code, it "looks fine", and you think the high traffic is the cause, you might also consider enabling biased locking or/and spin locking. See the Java 7 JVM options for more details (article contains links to Java 8 JVM options too).
GARBAGE COLLECTION
By the way, 'high traffic' should have made me ask this earlier: the garbage collection, have you monitored it? If not properly configured/tuned, GC might also often cause very significant pauses! (I had this week such a case, 15-30 seconds for full GC...)