0

I'm trying to mesure the performance of a function.

double microbenchmark_get_sqrt_latency()
{
    myInt64 start, end;
    list<double> cyclesList;
    int num_runs = 40;
    double cycles = 0.;
    double multiplier = 1.;
    double x = 500;

    // Repeat the measurement 1000 times
    for (size_t i = 0; i < 1000; i++)
    {
        // Measuring...
        start = start_tsc();
        for (size_t j = 0; j < num_runs; ++j)
        {
            sqrtsd(x);
        }
        // Maybe this instruction is called before the loop ends? somehow?
        end = stop_tsc(start);

        // Doesn't return the correct number of cycles because 
        cycles = ((double)end) / num_runs;
        cyclesList.push_back(cycles);
    }
    cyclesList.sort();
    auto it = cyclesList.begin();
    std::advance(it, cyclesList.size() / 2);
    return *it;
}

The problem here is that for the variable end which represents the number of cycles that has happened since the first rdtsc instruction is always equal to 22-24, even when num_runs varies up to 10000. I have no explanation for this, except that maybe the instruction is moved after the first iteration of the for loop.

The compiler and compiler flags that I'm using are : -O3 -fno-tree-vectorize -march=skylake -std=c++17

Here's the implementation of start_tsc() and stop_tsc():

#define RDTSC(cpu_c)     \
    ASM VOLATILE("rdtsc" \
                 : "=a"((cpu_c).int32.lo), "=d"((cpu_c).int32.hi))
#define CPUID()           \
    ASM VOLATILE("cpuid"  \
                 :        \
                 : "a"(0) \
                 : "bx", "cx", "dx")

unsigned long long start_tsc(void)
{
    tsc_counter start;
    CPUID();
    RDTSC(start);
    return COUNTER_VAL(start);
}

unsigned long long stop_tsc(unsigned long long start)
{
    tsc_counter end;
    RDTSC(end);
    CPUID();
    return COUNTER_VAL(end) - start;
}

What is wrong with the code? I expect the end variable to be proportional to num_runs, but it is not here. Any ideas?

truvaking
  • 347
  • 2
  • 10
  • I'm not an assembly guy but type punning (which is how you use `tsc_counter`) isn't legal in C++ so the compiler might do something you wouldn't expect. Can't you supply two `std::uint32_t`'s to `rdtsc` instead and combine the result (by `OR`ing) when you return? Also, make the functions return `std::uint64_t` instead of `unsigned long long` (even though it's probably the same thing). Is there a need for `myInt64`? You use it in calculations with `unsigned long long` so make that a `std::uint64_t` too. – Ted Lyngmo Mar 10 '20 at 22:47
  • there is no need for `myInt64`. I forgot to change it in the code. I'll do these changes now and see if it's any better. – truvaking Mar 11 '20 at 07:58
  • 1
    Does `sqrtsd(x)` have side effects? If not, perhaps the compiler simply optimized it out. Since you aren't doing anything with the return value, assuming `sqrtsd` has no side effects, calling it 1000 times, calling it 1 time, and calling it zero times give the same behavior. Compiler could just set `j = num_runs` and continue. I'd suggest looking at a disassembly of the function, and maybe [this question](https://stackoverflow.com/q/40122141). – Hasturkun Mar 11 '20 at 10:12
  • [This answer](https://stackoverflow.com/a/51907425/7582247) suggests using the built-in `__rdtsc()` instead. I tested it with `g++` and `clang++` and it works. The `include`s in the answer suggests that it works with MSVC too. – Ted Lyngmo Mar 11 '20 at 10:55
  • sqrt is defined as such : ```cpp static double sqrtsd(double x) { double r; __asm__("sqrtsd %1, %0" : "=x"(r) : "x"(x)); return r; } ``` – truvaking Mar 11 '20 at 12:05
  • @truvaking Did you try the built in `__rdtsc()`? `gcc` recommends _not_ using ASM for this but to use the built-ins. – Ted Lyngmo Mar 11 '20 at 12:18
  • since this is for a homework assignment, i beilieve i'm not supposed to change the definition of these functions, nonetheless i'll give it a try – truvaking Mar 12 '20 at 09:07

0 Answers0