I am using C on Linux, and would like to time the performance of my code for different input sizes. The code is for a numerical method for solving PDEs and I have come from a solid base to predict the execution time of the code.
The issue that I have is when recording timing results for a specific problem. For all of the problems I have, except one, using clock() to get a start and end time gives timings as I expect them to be. That is, if an input of size N takes time T then an input of size 4N will take time 4T. For one problem this is not the case (with and without optimisation flags turned on). In order to try and figure out what is going on I ran the code with a profiler (gprof) in which case the profiler is telling me that the code takes some time to execute, which is very far away from the clock() time that is given. The results shown below are the time given in gprof, and the time using clock() is given when the same code is re-compiled without -gp. As a sanity check all code is compiled without any optimisation flags set
Input size | Time (clock()) | Time (gprof)
256x256 | 122.77 | 32.14
512x512 | 538.94 | 128.37
1024x1024 | 2712.67 | 511.22
The scaling I get with gprof is as I want it to be, but when I execute the code without any profiling the code itself takes just as long (wall clock time) to execute as when the profiling is turned on (which is very strange) and the scaling is no longer the expected factor of 4. Is there any explanation as to why this should be the case?
The only thing that I can think that is different for the problem for the results presented above is that I make extensive use of the pow() function with fractional powers, which from a bit of research is said to run very slowly. However, I would expect the slowdown to be uniform, such that the scaling of the timings should still be uniform. Note that I am using no file I/O and very little output to the console, so should have nothing where the program may hang for large amounts of time