0

I have a code in C and in assembly (x86 on linux) and I would like to compare their speed, but I don't know how to do it.

In C, the time.h library allows us to know the execution time of the program. But I don't know how to do that in assembly.

I have found the instruction rdtsc which allows us to know the number of clock cycles between two pieces of code. But I have the impression that there is a huge noise on the returned value (maybe because of what is running on the pc?) I don't see then how to compare the speed of these two programs. The time observed in the command prompt is apparently not a reference...

How should I proceed ? Thanks

I have tried to substitute values that I got with the assembly programm with the values I got from an empty code in order to have an average value, but values are still incoherent

Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
wenzio
  • 9
  • 1
  • 1
    You can still call the functions in time.h. But maybe you don't need either program to time itself as you can print the total timing by putting `time` in front of the command when you run it. (Except on Windows) – user253751 Feb 01 '23 at 19:20
  • 1
    In my experience `rdtsc` actually works reasonably well for this. Interrupts and thread context switches are rare, relatively speaking. Just do, like, 10 runs and throw away the outliers. – 500 - Internal Server Error Feb 01 '23 at 22:28
  • @500-InternalServerError: Of course, you still need to do some warm-up iterations to get the CPU up to max turbo, since `rdtsc` counts at a fixed frequency, not core clock cycles. For that you'd need to use `rdpmc` after programming one of the PMUs. See [How to get the CPU cycle count in x86\_64 from C++?](https://stackoverflow.com/a/51907627) for more about RDTSC. (It's usually even synchronized across cores on modern systems, especially single-socket, so very short times after migrating between cores won't happen.) – Peter Cordes Feb 01 '23 at 22:38
  • To time small amounts of code, normally a repeat loop is the best bet, so you can run it for a measurable amount of time, much longer than measurement overhead and the CPU's out-of-order exec window. (Then `perf stat` is extremely good.) This is hard in compiled languages (much easier in asm), because you need to compile with optimization, but stop the compiler from hoisting work out of the loop. [Idiomatic way of performance evaluation?](https://stackoverflow.com/q/60291987) and stuff like Google Benchmark's `DoNotOptimize` `asm()` wrapper that forces a value to be materialized. – Peter Cordes Feb 02 '23 at 03:39

0 Answers0