Which factors affect the speed of cpu tracing?

Question

When I use YJP to do cpu-tracing profile on our own product, it is really slow.

The product runs in a 16 core machine with 8GB heap, and I use grinder to run a small load test (e.g. 10 grinder threads) which have about 7~10 steps during the profiling. I have a script to start the product with profiler, start profiling (using controller api) and then start grinder to emulate user operations. When all the operations finish, the script tells the profiler to stop profiling and save snapshot.

During the profiling, for each step in the grinder test, it takes more than 1 million ms to finish. The whole profiling often takes more than 10 hours with just 10 grinder threads, and each runs the test 10 times. Without profiler, it finishes within 500 ms.

So... besides the problems with the product to be profiled, is there anything else that affects the performance of the cpu tracing process itself?

Update: in YJP 2013, the cpu-tracing takes just 5 minutes for a simple test, which used to take 2 hours with the same test in 12.0.5. — Dante WWWW, Nov 08 '13 at 06:09

score 1 · Accepted Answer · answered May 13 '13 at 08:54

1

Last I used YourKit (v7.5.11, which is pretty old, current version is 12) it had two CPU profiling settings: sampling and tracing, the latter being much faster and less accurate. Since tracing is supposed to be more accurate I used it myself and also observed huge slowdown, in spite of the statement that the slowdown were "average". Yet it was far less than your results: from 2 seconds to 10 minutes. My code is a fragment of a calculation engine, virtually no IO, no waits on whatever, just reading a input, calculating and output the result into the console - so the whole slowdown comes from the profiler, no external influences.

Back to your question: the option mentioned - samping vs tracing, will affect the performance, so you may try sampling.

Now that I think of it: YourKit can be setup such that it does things automatically, like making snapshots periodically or on low memory, profiling memory usage, object allocations, each of this measures will make profiling slowlier. Perhaps you should make an online session instead of script controlled, to see what it really does.

answered May 13 '13 at 08:54

Tomasz Stanczak

12,796
1
30
32

Well I can also do sampling but the info tracing gets is far more detailed than sampling (and it's obvious from the size of snapshot files). I checked the yjp document. Two options may affect the speed: 1. Adaptive tracing mode, which is enabled by default to optimize the speed; 2. filters. I can use filters, but I also need the profile snapshot without any filters. From yjp 11 to 12, Yourkit claims cpu tracing has been speeded up by 40%, however, as our product is also in active development, it is much slower than the yjp-11 days. – Dante WWWW May 13 '13 at 09:13
I'm afraid you have the choice between speed (or rather less slowness) or detail reachness. My solution to such problems was to try to separate the concern into a smaller test case and profile only this instead of the whole application, since trying to profile the whole thing was slow up the the point of being unusable. The numbers in my answer are also from a sample, a single test from an extensive set of regression test base, whereas the full regression test actually never has finished being profiled. This ways I was able to tune enough to achieve required SLA. – Tomasz Stanczak May 13 '13 at 13:22
First of all, I can't divide my application into smaller pieces; I can split the test cases -- and actually the minimal test just include 3 steps: open homepage, login, and then logout. Even in this, the app responds after a very long time. The overall performance of the app is measured by regular grinder tests, and the profile is used by the core devs as a reference for their optimization. They can use filtered profiles, but they also need a full-path profile... Perhaps there is no solution to this kind of slowness. – Dante WWWW May 13 '13 at 15:39

score 0 · Answer 2 · edited May 23 '17 at 12:31

According to some Yourkit Doc:

Although tracing provides more information, it has its drawbacks. First, it may noticeably slow down the profiled application, because the profiler executes special code on each enter to and exit from the methods being profiled. The greater the number of method invocations in the profiled application, the lower its speed when tracing is turned on.

The second drawback is that, since this mode affects the execution speed of the profiled application, the CPU times recorded in this mode may be less adequate than times recorded with sampling. Please use this mode only if you really need method invocation counts.

Also:

When sampling is used, the profiler periodically queries stacks of running threads to estimate the slowest parts of the code. No method invocation counts are available, only CPU time.

Sampling is typically the best option when your goal is to locate and discover performance bottlenecks. With sampling, the profiler adds virtually no overhead to the profiled application.

Also, it's a little confusing what the doc means by "CPU time", because it also talks about "wall-clock time". If you are doing any I/O, waits, sleeps, or any other kind of blocking, it is important to get samples on wall-clock time, not CPU-only time, because it's dangerous to assume that blocked time is either insignificant or unavoidable. Fortunately, that appears to be the default (though it's still a little unclear):

The default configuration for CPU sampling is to measure wall time for I/O methods and CPU time for all other methods.

"Use Preconfigured Settings..." allows to choose this and other presents. (sic)

If your goal is to make the code as fast as possible, don't be concerned with invocation counts and measurement "accuracy"; do find out which lines of code are on the stack a large fraction of the time, and why. More on all that.

Which factors affect the speed of cpu tracing?

2 Answers2