how to compare different optimization level files from gprof

Question

everyone, I am running the gprof to check the percentage execution time in two different optimization level (-g -pg vs -O3 -pg).

So I got the result that one function takes 68% exc-time in O3, but only 9% in -g version.

I am not sure how to find out the reason behind it. I am thinking compare the two version files before compiled, but i am not sure the cmd to do so.

Is there any other method to find out the reasons for this execution time difference.

You can generate the disassembly, follow procedure from this answer: http://stackoverflow.com/questions/1289881/using-gcc-to-produce-readable-assembly — TJD, Sep 14 '12 at 23:24

score 6 · Answer 1 · answered Sep 19 '12 at 00:15

You have to be careful interpreting gprof/profiling results when you're using optimization flags. Compiling with -O3 can really change the structure of your code, so that it's impossible for gprof to tell how much time is spent where.

In particular, function inlining enabled with the higher optimization levels make it that some of your functions will be completely replaced by inline code, so that they don't appear to take any time at all. The time that would be spent in those child functions is then attributed to the parent functions that call them, so it can look like the time spent in a given parent function actually increased.

I couldn't find a really good reference for this. Here's one old example:
http://gcc.gnu.org/ml/gcc/1998-04/msg00591.html
That being said, I would expect this kind of strange behavior when running gprof with -O3. I always do profiling with just -O1 optimization to minimize these kinds of effects.

score 0 · Accepted Answer · answered Sep 14 '12 at 23:28

0

I think that there's a fundamental flaw in your reasoning: that the fact that it takes 68% of execution time in the optimized version vs just the 9% in the unoptimized version means that the unoptimized version performs better.

I'm quite sure, instead, that the -O3 version performs better in absolute terms, but the optimizer did a way better job on the other functions, so, in proportion to the rest of the optimized code, the given subroutine results slower - but it's actually faster - or, at least, as fast - than the unoptimized version.

Still, to check directly the differences in the emitted code you can use the -S switch. Also, to see if my idea is correct, you can roughly compare the CPU time took by the function in -O0 vs -03 multiplying that percentage with the user time took by your program provided by a command like time (also, I'm quite sure that you can obtain a measure of absolute time spent in a subroutine in gprof, IIRC it was even in the default output).

answered Sep 14 '12 at 23:28

Matteo Italia

123,740
17
206
299

actually, the -g version run time is 5.41s and the -O3 version is 2.866s. which give me the result that the function runs 0.5s in -g version, 2s in -O3 version. then my question is how come after the optimization the exe-time for this function actually increased. shouldnt it be small or equal? – fiftyplus Sep 14 '12 at 23:48
Well, this is interesting, have you tried to compare the generated assembly code? Also, what about -O2? -O3 sometimes tries too be a little too clever and may produce slower code than -O2 (although I never heard about -O3 slower than -O0). – Matteo Italia Sep 14 '12 at 23:51
-O2 is about 1.22s which in between the -O3 and -O0 for this function. so i guess i have to look at the assembly code? but the files are big – fiftyplus Sep 14 '12 at 23:59
Seems like two functions that are in -O0 profile but not in the -O3 version. And those two take about total 1.6s in -O0 version, is that possible, -O3 integrate those two functions into the function that time got increased? but the calls for the function only increase from 840000 to 880000. – fiftyplus Sep 15 '12 at 00:13
68% and 9% are percentages, not absolute time. It just means that some other functions have a better speed-up with -O3 than the function you look at. – Caduchon Jan 17 '23 at 10:29

how to compare different optimization level files from gprof

2 Answers2