JMX: CPU + Blocked + Wait time = Wall time?

Question

I'm trying to use JMX for measuring how long a method call took and how much of that was CPU time, how much was the thread being blocked and how much was waiting. Ideally, I would expect that CPU time + Block time + Wait time = Wall time, but I've noticed that it's not the case - and it's not just slightly inaccurate timer. For example:

Wall time:  5657.305 ms
CPU time:   4060.000 ms (71.77%)
User time:  3840.000 ms (67.88%)
Block time: 0.000 ms (0.00%)
Wait time:  0.000 ms (0.00%)

So, the question is ... is my presumption that the sum of these times (not User time, that's included in CPU time) should give Wall time wrong? Am I missing something?

Some more details:

Wall time: difference of System.currentTimeMillis() at the method entry and exit
CPU time: difference of ManagementFactory.getThreadMXBean().getCurrentThreadCpuTime() at the method entry and exit
Block and Wait time: analogically to CPU, with ManagementFactory.getThreadMXBean().getThreadInfo(Thread.currentThread().getId()).getBlockedTime() and getWaitedTime()
Yes, I do know that these methods return time in different units (ns/ms), I take that into account.
The application is highly hyperthreaded (4000+ threads), but I store all the info per thread, so there shouldn't be any interference between calls from different threads.

score 2 · Accepted Answer · edited May 23 '17 at 10:28

CPU time tells you how much time the thread took utilizing the CPU, i.e. actually executing code. Block or wait times increase if the thread had been suspended because of e.g. waiting for I/O or entering a monitor.

However, it's up to the JVM and OS to assign CPU time to threads and processes. They may choose to suspend a thread or process and resume another one at any time, if necessary. So the thread may be in a state where it is neither blocked nor waiting but just not executing. This will add to wall time but not blocked/wait time.

The CPU time value comes from the operating system. I checked OpenJDK 7 for Linux systems, so it might be different on other platforms. Either fast_thread_cpu_time() or slow_thread_cpu_time() is called to calculate CPU time for threads. Both functions are located inhotspot/src/os/linux/vm/os_linux.cpp of the OpenJDK 7 source.

Since OpenJDK uses pthreads on Linux, each Java thread is implemented as a lightweight OS thread. The fast method supported on modern kernels calls pthread_getcpuclockid to retrieve the clock_id for specific thread and then call clock_gettime to retrieve the CPU time for the thread. The slow method will lookup the CPU time values from /proc/self/task/<tid>/stat. Both methods tell you how much time a thread spent consuming CPU in user and system/kernel space.

References:

I thought it might be this case ... can you (or anyone else) link any source that can support this? (and by "this" I mean "thread is neither running, blocked nor waiting when it can run but is not given time window on a processor core to run on") — Sethiel, Jun 26 '13 at 20:54
@Sethiel: I edited the answer to include some implementation details which might help. You should be able to proof that by creating threads that consume CPU but don't use synchronization or other block/wait functionality. The total count of threads must be greater than the number of CPU cores of the system. Assign the lowest priority to one of the threads. This thread's CPU time should obviously increase much slower than the others' because it was scheduled less often. However, all threads should be actually in the `RUNNING` state all the time. — nif, Jun 27 '13 at 09:54

score 1 · Answer 2 · answered Jun 26 '13 at 15:12

Another way you might be able to improve the precision of your measurements is:

Do your calculations in nano-seconds, so elapsed time (using System.nanoTime() and CPU time are natively in nanos. Blocked and Waiting times should be converted since they are in millis.
Implement an approximation of how long it takes to simply capture elapsed, cpu, blocked and waiting times. (Call this OVERHEAD.) You might do this in a static block somewhere. Do a JVM warmup loop first, keeping in mind that the JIT kicks in after 15000 method calls [by default ? depends on your JVM....], so a warmup loop of at least that size would be good.
The run the post-warmup loop, take the total elapsed divided by the loop count and you will have your OVERHEAD value.
Subtract OVERHEAD from your calculated method elapsed times to get a better idea of the code execution time of the method, or....
Add OVERHEAD to your addition of CPU, Blocked and Waited to get closer to the actual wall time.
Optionally recalculate OVERHEAD from time to time.

It's not perfect, and it is certainly not exacting, but it might give you slightly better numbers.

Here's some test code I ran to calculate my average OVERHEAD (Java 7, Windows 7, 64 bit). I tried to make sure no methods were being ellided, but your mileage may vary.

public class Overhead {
    static final ThreadMXBean tmx = ManagementFactory.getThreadMXBean();
    public static void main(String[] args) {

        tmx.setThreadContentionMonitoringEnabled(true);
        tmx.setThreadCpuTimeEnabled(true);
        int loops = 15000;
        long sum = -1;
        long start = System.nanoTime();
        for(int i  = 0; i < loops; i++) {
            sum = measure();
        }
        long elapsed = System.nanoTime()-start;
        log("Warmup completed in [" + elapsed + "] ns. ");
        log("Sum:" + sum);

        start = System.nanoTime();
        loops = loops * 2;
        for(int i  = 0; i < loops; i++) {
            sum = measure();
        }
        elapsed = System.nanoTime()-start;

        long avg = (elapsed/loops);

        log("Test completed in [" + elapsed + "] ns. OVERHEAD: [" + avg + "] ns.");
        log("Sum:" + sum);
    }

    protected static long measure() {
        long s1 = System.nanoTime();
        long bt = tmx.getCurrentThreadCpuTime();
        ThreadInfo ti = tmx.getThreadInfo(Thread.currentThread().getId());
        long blocked = ti.getBlockedTime();
        long waited = ti.getWaitedTime();
        long s2 = System.nanoTime();
        return ((s2 - s1) + blocked + waited + bt);
    }


    public static void log(Object msg) {
        System.out.println(msg);
    }
}

My output was as follows:

Overhead test
Warmup completed in [43176164] ns. 
Sum:109201929
Test completed in [38482368] ns. OVERHEAD: [1282] ns.
Sum:156002228

score 0 · Answer 3 · answered Jun 26 '16 at 02:09

From a code execution perspective, a thread can be running, blocked on a monitor, or waiting for io. A running thread, however, must compete with other running threads to be assigned to a cpu to run - until it is assigned to a cpu, it is effectively idling, not taking cpu time but taking wall time. If you you have 1000's of threads and but a few cpu cores, the idling can be significant. If that is the case, you may see high user cpu AND high context switch with 'vmstat'.

JMX: CPU + Blocked + Wait time = Wall time?

3 Answers3