2

I use tools like VisualVM to profile my methods to find slow methods to rework to make them faster, but I can't seem to find any profiler that will time individual loops or code blocks inside a method.

Right now I have to resort to manually timing >

long sTime = System.currentTimeMillis();
//my Code here
System.out.println("time=" + ((System.currentTimeMillis() - sTime)));

but this is really tedious as I have to manually add and remove the timers all the time for every loop/if/codeblock to get fine grained timing data. I was wondering if there is any tools which can profile code blocks or some kind of eclipse plugin that can automate the process of adding an removing timers. I've used code templates but its far from perfect.

I'm sure I'm not the only one that would like a more elegant solution to fine grained profiling.

daedalus
  • 105
  • 2
  • 10

2 Answers2

0

For writing micro benchmarks, you should use a proper harness, such as JMH. Using the right tool for benchmarking also allows you to avoid the common pitfalls.

Mick Mnemonic
  • 7,808
  • 2
  • 26
  • 30
  • I repeat all my timing tests to get an average and do thousands of iterations and average the result by the number of iterations, JMH is ok but its far too laborious and time intensive to keep changing all the time and I don't think it can bench code blocks only methods. – daedalus Aug 03 '15 at 03:21
0

You say (in effect) you want to find out how to make your code run faster. That's what most people want to do. There's a simple method that requires no profiler at all.

Your code can only run faster on the same hardware if it's doing something it doesn't need to do and you get rid of that thing. There's no other way.
Removing it (let's call it K) will save some fraction of time. Let's suppose it's 30%.

That means if you have a way to get a random-time stack sample, like with jstack or by simply pausing it in the debugger and displaying the stack (and maybe some data), that sample has at least a 30% chance of occurring during K. "at least" because it may actually spend 50% of time while the fix only saves 30%.
If you do this 10 times, you can expect to see K about 3 times (or more).

So do it 10 times.
Each stack sample will show you every function and line of code on the stack, and if you read it you will have a full understanding of what it is doing, and why, at that moment in time.
Look in there for anything, at any level of the stack, that could be done better.
If you see a particular thing that could be done better, and you see it on more than one sample, you've found K.
In fact, the bigger its percent is, the fewer samples you need to see it more than once.
Fix it, and use your timer to see the result.

Then you can do it all over again, to get the next speedup.
When you can't find any more, your code is basically optimal.

It might be surprising, but this can find any speedup any profiler can find, and some that they won't.
It's called random pausing, and a lot of people rely on it.

Some people say this is just what profilers do, only better, but they need to think about it. The problem with even good profilers that sample the stack, with line-level precision, on wall-clock time is they have a back-end that summarizes - hot path, call graphs, flame graphs, "total time", "self time" ... whatever. You are encouraged to look at that, rather than the actual samples themselves.
This post shows how easy it is for speedups to hide from those summaries, and why that's a Bad Thing.
The samples are where the insight is that tells you what's happening that could be done better, and if it takes enough time to be worth fixing, a large number of samples is not needed to see it.
If you're wondering why, the math is here.


Responding to comment:

You're on the right track. You know it works on an infinite loop or near-infinite loop, because such a problem takes 99.99% of the time, so your pause is certain to land in it. It also works if the problem takes a lesser percentage of time, like 30% or even 10%.

It the problem takes fraction F of time, then you will see it, on average, every 1/F samples. For example if F is 0.3 (30%) then you will see it on average every 1/0.3 = 3.33 samples. However, if you only see some suspicious code once that doesn't mean it's a problem. But if you see it twice (or more) then you know for sure it is an opportunity to get speedup. The average number of samples it takes to see a problem twice is 2/F, or if F is 30%, 6.67 samples. So just keep taking samples until you see some suspicious code twice.

If your code runs too fast, no problem. Just put a long-running loop around it. Suppose it takes 3 microseconds, as you say. All you have to do is loop it 10^7 times, and it will take 30 seconds. And if there's a speedup opportunity in there taking 30% of time, every pause will have a 30% chance of displaying it, no matter how fast it is. You don't need to worry about code being fast.

You want to save about 70% of time. It will probably take multiple fixes to do this. Suppose there are potential speedups taking 30%, 20%, 10%, and 10%. Remove the 30% one, and that expands the others by 1.43 to 29%, 14% and 14%. Remove the 29% one, and now they are 20% and 20%. Remove the next one, and the last one is 25%. That's how you get big speedups - by peeling-away.

Now here's the rock-bottom most important point: You can see from this how essential it is not to miss a single one. Getting just three out of four, missing one, will not cut the mustard. Speedup opportunities can easily hide from profilers' summaries, but they can't hide from you looking at stack samples. Forget all the fancy profiler results, you need speed, and that's the bottom line.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • Interesting, I actually do this sometimes in eclipse but only when things are running so slow I think its stuck in a loop and hit pause. The issue I have would be getting stack traces progamatically in a tiny time period as the code in my golden method only takes 3100 nanoseconds to run on average (but its called 212K times), so I need to get it down to 1000 nanoseconds so all the calls complete in my target of 200-300 milliseconds. At this level it might be certain ops slowing things down but I don't know which, eg, bitshifts/ifs/arrayIdx//bitmasks, I will see how I get stack from API in jav – daedalus Aug 05 '15 at 14:51