Performance comparison of a program: C / Assembly

Question

I have a code in C and in assembly (x86 on linux) and I would like to compare their speed, but I don't know how to do it.

In C, the time.h library allows us to know the execution time of the program. But I don't know how to do that in assembly.

I have found the instruction rdtsc which allows us to know the number of clock cycles between two pieces of code. But I have the impression that there is a huge noise on the returned value (maybe because of what is running on the pc?) I don't see then how to compare the speed of these two programs. The time observed in the command prompt is apparently not a reference...

How should I proceed ? Thanks

I have tried to substitute values that I got with the assembly programm with the values I got from an empty code in order to have an average value, but values are still incoherent

You can still call the functions in time.h. But maybe you don't need either program to time itself as you can print the total timing by putting `time` in front of the command when you run it. (Except on Windows) — user253751, Feb 01 '23 at 19:20
In my experience `rdtsc` actually works reasonably well for this. Interrupts and thread context switches are rare, relatively speaking. Just do, like, 10 runs and throw away the outliers. — 500 - Internal Server Error, Feb 01 '23 at 22:28
@500-InternalServerError: Of course, you still need to do some warm-up iterations to get the CPU up to max turbo, since `rdtsc` counts at a fixed frequency, not core clock cycles. For that you'd need to use `rdpmc` after programming one of the PMUs. See [How to get the CPU cycle count in x86\_64 from C++?](https://stackoverflow.com/a/51907627) for more about RDTSC. (It's usually even synchronized across cores on modern systems, especially single-socket, so very short times after migrating between cores won't happen.) — Peter Cordes, Feb 01 '23 at 22:38
To time small amounts of code, normally a repeat loop is the best bet, so you can run it for a measurable amount of time, much longer than measurement overhead and the CPU's out-of-order exec window. (Then `perf stat` is extremely good.) This is hard in compiled languages (much easier in asm), because you need to compile with optimization, but stop the compiler from hoisting work out of the loop. [Idiomatic way of performance evaluation?](https://stackoverflow.com/q/60291987) and stuff like Google Benchmark's `DoNotOptimize` `asm()` wrapper that forces a value to be materialized. — Peter Cordes, Feb 02 '23 at 03:39

Performance comparison of a program: C / Assembly

0 Answers0