Analyze the "runnable" thread dump under high load server side

Question

The thread dump from a java based application is easy to get but hard to analyze!

There is something interesting we can see from the thread dump.

Suppose we are in a heavily load java web app. And I often take 10 or 15 thread dump files during the peak time ( under the high loads) to make the wide data. So first, there is no doubt that we need tune up the codes whose status is Blocked and Monitor. I can't dig into more with the rest Runnable threads.

So, if the "method" appears from the thread dump many time, can we say it is slower or faster than the other under high load server side? Of course, we can use more profiling tools to check that but the thread dump may give us the same useful information, especially we are in the production env.

Thank you in adv!

Vance

My question is that we may sometime "ignore" of the "runnable" status from the thread dump.And what do u think they are really mean? Are they just really "normal" in the running status, or can we see some performance issues from that? Sorry for the misunderstanding question I post. — Vance, Mar 21 '11 at 16:55
I suggest you focus on the threads where the stack trace changes. For those with a static stack trace, its likely they are not doing anything. — Peter Lawrey, Mar 21 '11 at 16:57
Ideally you would use a profiler using production or production like environment. This will take alot of the guess work out the problem. — Peter Lawrey, Mar 21 '11 at 16:58
Thank you! Since the thread dump has lower overhead then a profiler, this is my first choice when investigate the performance issues. — Vance, Mar 21 '11 at 17:07
If a method (or more specifically, a line of code) appears often in thread dumps, it is responsible for that fraction of time. If you need it, then you need it. But if you can replace it with something more efficient, Great! You've made a speedup! Forget measuring execution time or how many times things are called, or sampling overhead. Those things are almost irrelevant. "Fraction of time on stack" is where the gold is, and it doesn't have to be precise. — Mike Dunlavey, Mar 21 '11 at 19:33

score 2 · Accepted Answer · edited May 23 '17 at 10:33

I would look carefully at the call stack of the thread(s) in each dump, regardless of the thread's state, and ask "What exactly is it doing or waiting for at that point in time, and why?"

You don't just look at functions on the call stack, you look at the lines of code where the functions are called. That tells you the local reason for the call. If you combine the local reasons for the calls on the stack, that gives you the complete reason (the "why chain") for what the thread was doing at that time.

What you're looking for is bad reasons that appear on more than one snapshot. (It only takes one unnecessary call on the stack to make that whole sample improvable, so the deeper the stack, the better the hunting.) Since they're bad, they can be fixed and you will get improved performance. The amount of the performance improvement is roughly the fraction of snapshots that show them. That's this technique.

Thanks! Will practice more investigating/analyzing with thread dump! — Vance, Mar 22 '11 at 08:16

score 1 · Answer 2 · answered Mar 21 '11 at 16:58

I'd say that if a method appears very often in a thread dump you'd have to either

optimize that method since it is called many times or
check whether that method is called too often

If you see that the thread runs spend lots of time in a particluar method, there might also be some bug (like we had with using a special regex that suffered from a bug in the regex engine). So you'd need to investigate that.

Analyze the "runnable" thread dump under high load server side

2 Answers2