3

I am using unsigned long to collect performance data while profiling an application as follows

  1. unsigned long time_taken = (unsigned long) ((finish_time_in_nano_sec - start_time_in_nano_sec)/1000);

  2. Write time_taken to a file. Do this for every function that gets called in my app.

  3. Now after app execution, have a separate program read the above file and sum up the time taken for every function including call counts such as

func_name aggregate_time call_counts

The problem I have is that for some functions, the aggregate_time field is a 20 digit value i.e. the maximum value an unsigned long can hold. This cannot be true because I measure time in microseconds and don't run my app for more than like 20 seconds. How can it then be a 20 digit value?

Do you see mistakes in Steps 1,2 and 3?

Regards, Krishna

EDIT:

1) Time measurement: clock_gettime(CLOCK_REALTIME, &start_time); clock_gettime(CLOCK_REALTIME, &finish_time); unsigned long time_taken = (unsigned long)((finish_time.tv_nsec - art_time.tv_nsec)/1000);

2) File write: fwrite(&time_taken, sizeof(unsigned long), 1, datafile);

3) File read: fread(&time_taken, sizeof(long), 1, datafile);

Krishna
  • 145
  • 1
  • 11
  • 1
    Isn't there a tool out there alrdy which does this for you? http://stackoverflow.com/questions/67554/whats-the-best-free-c-profiler-for-windows-if-there-are – ronag Dec 16 '10 at 23:39
  • 1
    Maybe you got finish and start times reversed. – Gabe Dec 16 '10 at 23:40
  • Is there some reason you can't use, for example, `gprof` (which does exactly this) or valgrind/callgrind (which is a better profiling tool)? – greyfade Dec 16 '10 at 23:40
  • Hi, Thanks. Yes. For some reason, I am unable to use them. There is even Pin probe, but I am asked to do it from first principles. – Krishna Dec 17 '10 at 00:04

3 Answers3

5

The problem is in your time measurement code:

clock_gettime(CLOCK_REALTIME, &start_time);
clock_gettime(CLOCK_REALTIME, &finish_time);
unsigned long time_taken = (unsigned long)((finish_time.tv_nsec - start_time.tv_nsec)/1000);

If a 1-second clock boundary is crossed between start_time and finish_time, then your subtraction could result in a negative value, and will then be converted to a very large positive unsigned value.

You need to take into account the tv_sec member of the times as well:

unsigned long time_taken;
time_taken = (unsigned long)((finish_time.tv_sec - start_time.tv_sec) * 1000000);
time_taken += (unsigned long)(finish_time.tv_nsec / 1000);
time_taken -= (unsigned long)(start_time.tv_nsec / 1000);

(eg a clock time of 23989032.452 seconds would be representated as a struct timespec with .tv_sec = 23989032 and .tv_nsec = 452000000)

caf
  • 233,326
  • 40
  • 323
  • 462
  • Oops... Brilliant observation!! Thanks for this... Realy embarassing that I missed this... How many votes do you want for this answer? :-) But why are you multiplying tv_sec value by only 1000? It should be 1000000 right? – Krishna Dec 17 '10 at 00:28
  • @Kris Retro Virus: you should select this one as the answer (and doing it will even give you a few points). – kriss Dec 17 '10 at 00:37
  • Oh ok. I thought tv_sec was in seconds. – Krishna Dec 17 '10 at 00:46
  • 1
    @Kris Retro Virus: You're completely right, it should be multiplied by 1000000. – caf Dec 17 '10 at 00:52
1

The problem isn't in the code you show - though it would be interesting to know the types of the times in nanoseconds.

Therefore, the problem is in the code you don't show:

  • writing the data to file
  • reading the data from file
  • obtaining the times in nanoseconds
  • processing the times in microseconds
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1) Time measurement: clock_gettime(CLOCK_REALTIME, &start_time);clock_gettime(CLOCK_REALTIME, &finish_time); unsigned long time_taken = (unsigned long)((finish_time.tv_nsec - start_time.tv_nsec)/1000); 2) File write: fwrite(&time_taken, sizeof(unsigned long), 1, datafile); 3) File read: fread(&time_taken, sizeof(long), 1, datafile); – Krishna Dec 17 '10 at 00:02
  • Hi, Sorry about the above being jumbled up without indentation.... I don't know how to add formatting in this text box provided... – Krishna Dec 17 '10 at 00:06
0

Actually, it would be possible in some cases to obtain negative times. Considering that the time intervals are very small, they most probably come from using RDTSC instruction, that reads a counter that increments with every clock of the CPU. On some multicore AMD processors each core have its own counter, that is not synchronized with others, so it could be possible that the difference between two consecutive readings performed on different cores could be negative.

It can be solved easily by forcing the program to execute on a single core, for example by using Win32 API SetProcessAffinityMask.

EDIT: Be careful how you measure times. Most time APIs have a horrible resolution, so measuring small intervals using them have no sense. Try using RDTSC explicitly.

And the most important: file I/O is such a slow function, that it can completely destroy useful information. For example, if you perform the measurements for both, the caller and the callee, and callee write the result to file, then the time measured for caller would include the time spent in fwrite. Monitor also each write, so that you can see how much time is spent there (to avoid recursion, you can save the time took by a fwrite and log it with the next fwrite).

BenMorel
  • 34,448
  • 50
  • 182
  • 322
ruslik
  • 14,714
  • 1
  • 39
  • 40
  • I have the same problem if I use gettimeofday() function also. In addition, I am not measuring time across functions, my start and finish times are the beginning and end of every function... So I guess chances of start time being executed in one CPU and finish time on the other are less or ruled out.. – Krishna Dec 17 '10 at 00:10
  • @Kris: you have to add one single API call in your `main()` to check it. – ruslik Dec 17 '10 at 00:11
  • I have no control over main... Mine is soemthing like a shared object that "hijacks" OpenGL API calls made by my executable (I only run it as is... can't even recompile... just use LD_PRELOAD to link the executable to my shared object and run!!) – Krishna Dec 17 '10 at 00:13
  • You can use process explorer to change affinity. – ruslik Dec 17 '10 at 00:16
  • Hi Ruslik, I find your "Be careful how you measure times. Most time APIs have a horrible resolution, so measuring small intervals using them have no sense. Try using RDTSC explicitely." comment close to something that I suspect too... But I don't know if there is a way out... And RDTSC is fraught with it's own problems... Kind of clueless as to what the real issue is... whether it is usage of wrong APIs to measure time, mistake in writing/reading to the file??? – Krishna Dec 17 '10 at 00:21
  • File I/O is not an issue here as I am using sophisticated ways such as buffering, child process to write data to file via pipes etc... The problem can only be with the timing API or the way I read and write to file.... – Krishna Dec 17 '10 at 00:26
  • @Kris: unless you take IDA, you will never know for sure how each API is implemented. So why not write it carefully and take care of everything? Use rdtsc and lock the process somehow to a single core. Is it at least AMD? – ruslik Dec 17 '10 at 00:26
  • No it is Intel. Though I appreciate your time I am somehow not convinced by your reply... anyways let me try your suggestion as I don't have a means to disprove your ideas – Krishna Dec 17 '10 at 00:34
  • @Kris: I'm agree with @caf. Ocama's razor.. :) – ruslik Dec 17 '10 at 00:40