1

I am using C on Linux, and would like to time the performance of my code for different input sizes. The code is for a numerical method for solving PDEs and I have come from a solid base to predict the execution time of the code.

The issue that I have is when recording timing results for a specific problem. For all of the problems I have, except one, using clock() to get a start and end time gives timings as I expect them to be. That is, if an input of size N takes time T then an input of size 4N will take time 4T. For one problem this is not the case (with and without optimisation flags turned on). In order to try and figure out what is going on I ran the code with a profiler (gprof) in which case the profiler is telling me that the code takes some time to execute, which is very far away from the clock() time that is given. The results shown below are the time given in gprof, and the time using clock() is given when the same code is re-compiled without -gp. As a sanity check all code is compiled without any optimisation flags set

   Input size   |   Time (clock())   |   Time (gprof)
     256x256    |       122.77       |       32.14
     512x512    |       538.94       |      128.37
    1024x1024   |      2712.67       |      511.22

The scaling I get with gprof is as I want it to be, but when I execute the code without any profiling the code itself takes just as long (wall clock time) to execute as when the profiling is turned on (which is very strange) and the scaling is no longer the expected factor of 4. Is there any explanation as to why this should be the case?

The only thing that I can think that is different for the problem for the results presented above is that I make extensive use of the pow() function with fractional powers, which from a bit of research is said to run very slowly. However, I would expect the slowdown to be uniform, such that the scaling of the timings should still be uniform. Note that I am using no file I/O and very little output to the console, so should have nothing where the program may hang for large amounts of time

Keeran Brabazon
  • 399
  • 1
  • 3
  • 13
  • If it is a multithreaded application, it may be this: http://stackoverflow.com/a/25625385/1756405 – IdeaHat Sep 02 '14 at 14:48
  • @MadScienceDreams No. This is a single threaded serial code. Everything calculated in a nice predictable manner (at least I thought it was predictable) – Keeran Brabazon Sep 02 '14 at 15:01
  • 1
    try with another profiler and paste the results here – Nachiket Kate Sep 03 '14 at 05:09
  • Your real problem is that clock() is using the 'actual' RTC clock which will show elapsed time from start to finish while the profiler will show only the time spent in the application being profiled. I.E. the OS takes up the remaining time. – user3629249 Sep 03 '14 at 23:21
  • 1
    @user3629249 I thought that on Linux clock() counts the CPU ticks for the current application rather than the CPU ticks elapsed since the start. Also, for other problems (where the code is 90% the same) the clock() time is much closer to the time that the profiler says, so that I am happy that only a little extra time is being taken by the OS – Keeran Brabazon Sep 04 '14 at 09:25

1 Answers1

2

What gprof does is sample the program counter 100 times per second, and each time see if it can figure out which function the program counter is in. (It also keeps track of how many times any function calls any other function.) But it only does this for functions compiled with -pg.

Suppose the program counter is in a math library because that's what you called, but the math library was not compiled with -pg. gprof will basically throw away the sample.

Look, gprof is a venerable tool, but stackoverflow is full of questions trying to make sense of it. Don't expect anything like accurate measurements out of it, except for programs that don't call any library functions, don't do recursion, and don't do any I/O - i.e. toy programs.

The best way to get measurements of execution time is just the way you're doing it - simple overall timing. If you want more digits of precision, just repeat the execution N times and divide by N.

If on the other hand, you want to see what fraction of time is spent by functions or lines of code, both program-counter-only (exclusive or self time) and including callees (inclusive time) you need to get stack samples. (Because inclusive fraction = fraction of time on stack, and exclusive fraction = fraction of time at end of stack.) Some profilers do that. Many people just do it manually if they're content with low precision. If it's possible that the execution includes calls to blocking functions like sleep, mutex-wait, or I/O, and you don't want that excluded, the samples need to be taken on wall-clock time. Very few profilers do that, but the manual method does it, obviously.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • Thanks for the answer. I will add to this a little bit. I found that the issue with this is the 'pow' function in c which is known to be slow ([see here](http://entropymine.com/imageworsener/slowpow/)). It makes sense, therefore, that gprof does not count the time spent on 'pow', but in this case that is kind of what I am interested in. I want to know exclusively how good my code is rather than how well it runs with other people's (possibly slow) code around it – Keeran Brabazon Sep 04 '14 at 13:17
  • @Keeran: Manual sampling showed me pow, exp, and log taking a large fraction of time (solving ODEs in pharmacometrics). It also showed me that often the arguments were the same as previously. So I memoized them (made a wrapper function that remembered previous results). Very nice speedup. – Mike Dunlavey Sep 04 '14 at 14:21
  • This could be an idea, but unfortunately my values are not the same, and I am interested in a high level of accuracy, so I can't histogram the values either. Due to a previous poster I also discovered the joys of manual sampling with valgrind - fantastic! I have switched to the icc compiler, which is a lot better for some preliminary small runs, and hopefully will be good for the longer runs as well – Keeran Brabazon Sep 04 '14 at 16:24