2

I`ve got a task to count branch misprediction penalty (in ticks), so I wrote this code:

int main (int argc, char ** argv) {
    unsigned long long start, end;
    FILE *f;
    f = fopen("output", "w");
    long long int k = 0;
    unsigned long long min;
    int n = atoi(argv[1]);// n1 = atoi(argv[2]);
    for (int i = 1; i <= n + 40; i++) {
        min = 9999999999999;
        for(int r = 0; r < 1000; r++) {
            start = rdtsc();
            for (long long int j = 0; j < 100000; j++) {
                if (j % i == 0) {
                    k++;
                }
            }
            end = rdtsc();
        if (min > end - start) min = end - start;
    }
    fprintf (f, "%d %lld \n", i, min);

}
fclose (f);
return 0;
}

(rdtsc is a function that measures time in ticks)

The idea of this code is that it periodically (with period equal to i) goes into branch (if (j % i == 0)), so at some point it starts doing mispredictions. Other parts of the code are mostly multiple measurements, that I need to get more precise results.

Tests show that branch mispredictions start to happen around i = 47, but I do not know how to count exact number of mispredictions to count exact number of ticks. Can anyone explain to me, how to do this without using any side programs like Vtune?

Gregpack
  • 45
  • 6
  • don't forget to up vote answers which are helpful and accept the one which solved your question! – Jay Dec 12 '18 at 03:42

1 Answers1

1

It depends on the processor your using, in general cpuid can be used to obtain a lot of information about the processor and what cpuid does not provide is typically accessible via smbios or other regions of memory.

Doing this in code on a general level without the processor support functions and manual will not tell you as much as you want to a great degree of certainty but may be useful as an estimate depending on what your looking for and how you have your code compiled e.g. the flags you use during compilation etc.

In general, what is referred to as specular or speculative execution and is typically not observed by programs as their logic which transitions through the pipeline is determined to be not used is then discarded.

Depending on how you use specific instructions in your program you may be able to use such stale cache information for better or worse but the logic therein would vary greatly depending on the CPU in use.

See also Spectre and RowHammer for interesting examples of using such techniques for privileged execution.

See the comments below for links which have code related to the use of cpuid as well as rdrand, rdseed and a few others. (rdtsc)

It's not completely clear what your looking for perhaps but will surely get you started and provide some useful examples.

See also Branch mispredictions

Jay
  • 3,276
  • 1
  • 28
  • 38
  • problem is that I need to get this data by doing tests, not by using software – Gregpack Dec 08 '18 at 16:10
  • You will need to use software to execute those instructions required to both obtain the information you need to check for as well as to interpret the values; the changes a bit depending on the model of the cpu... See also: https://github.com/juliusfriedman/net7mma/blob/38015e9796475f2c48f4afe36a9d46ed2b1ceb9b/Concepts/Classes/CentralProcessingUnit.cs Once you have the path to get the information you desire you would typically then insert a function with the measurement logic before / after the call your profiling to obtain the results... there are a lot of factors including cache size etc .... – Jay Dec 08 '18 at 19:37
  • See also: https://github.com/juliusfriedman/net7mma/blob/38015e9796475f2c48f4afe36a9d46ed2b1ceb9b/Concepts/Classes/Clock.cs – Jay Dec 08 '18 at 19:38