3

I must be doing something stupid or using perf incorrectly ?

#include <iostream>

int main()
{
        return 0;
}

Compile command (Using g++-9.2.1)

g++ -std=c++17 -Wall -Wextra -pedantic -O3 Source.cpp -o prog

Following the tutorial

stat Run a command and gather performance counter statistics

I attempted

perf stat ./prog

And in the output

       560,957      branches                  #  303.607 M/sec
        16,181      branch-misses             #    2.88% of all branches

The question is why? should I "clean" the registers before running this command? is this normal?

Tony Tannous
  • 14,154
  • 10
  • 50
  • 86
  • Why do you think a branch prediction accuracy of more than 97% is "so many branch misprediction"?! – David Schwartz Dec 02 '20 at 07:41
  • 1
    @DavidSchwartz why should there be 16K branch misprediction for a program without a single `switch` or `if`? – Tony Tannous Dec 02 '20 at 07:42
  • 1
    @DavidSchwartz , 16181 misses for `return 0;` seems a lot to me too. – Aganju Dec 02 '20 at 07:43
  • 4
    Then you're asking the wrong question. If your question is why such a simple program has 560,957 branches, then ask that. But if you think 97% branch prediction accuracy is not good branch prediction, then I don't know what to tell you. – David Schwartz Dec 02 '20 at 07:43
  • While the code you've made is simple, there's also a lot of *other* code in your program. Code that is needed to set up the environment before the `main` function is called, and code to clean up once `main` returns. – Some programmer dude Dec 02 '20 at 07:45
  • It probably takes a lot to start up and shut down `std::cin/cout/cerror` and other global stuff. – Ken Y-N Dec 02 '20 at 07:46
  • 1
    @DavidSchwartz: The OP is looking at the total count for branch misses, not the rate. Yes, the next logical step is to ask why there are 560k total branches, and where they are. But you seem to be accusing Tony of asking something other than the actual question. And BTW, 97% branch prediction accuracy is not that amazing on modern CPUs. – Peter Cordes Dec 02 '20 at 07:46
  • @TonyTannous: It's not a duplicate of [GCC C++ "Hello World" program -> .exe is 500kb big when compiled on Windows. How can I reduce its size?](https://stackoverflow.com/q/1042773), but [Number of executed Instructions different for Hello World program Nasm Assembly and C](https://stackoverflow.com/a/35210404) discusses how much code runs before you get to `main` in a normal dynamically-linked executable, vs. a statically linked hand-written asm program. – Peter Cordes Dec 02 '20 at 07:58
  • @PeterCordes I am aware it's not. I am looking more at https://stackoverflow.com/questions/54355631/how-do-i-determine-the-number-of-x86-machine-instructions-executed-in-a-c-progra and will also look at the 2nd link in your comment. Thanks a lot! – Tony Tannous Dec 02 '20 at 07:59
  • 1
    I was just confirming that your (now deleted) comment about that first dup suggestion was correct; as someone who *does* already know the answer to your question, it didn't look like a good choice of dup to me either. It's not an exact duplicate of the questions I chose either, but the answers there explain what you really need to know to understand this (for total instructions, leaving branch-misses out of it.) – Peter Cordes Dec 02 '20 at 08:03

1 Answers1

4

About 80% of the branching comes from dynamic linking. Files need to be opened and then the dynamic libraries need to be parsed. This requires a lot of decision making as the contents of the file have to be tested to see what their format is, what sections they have, and so on.

Most of the remaining 20% is precisely that same kind of logic operating on the executable. It has a complex format and code has to parse that format to figure out what sections it has, find the endings of each section, and decide how to lay them out in memory before the program begins executing.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • 4
    Also keep in mind that `perf stat` will be counting user and kernel code (syscalls and interrupt handlers), if `/proc/sys/kernel/perf_event_paranoid` is set low enough to allow non-root to get event counts for kernel code. `perf stat --all-user` or `perf stat -e cycles:u,instructions:u,branches:u,branch-misses:u ...` will count only user-space events. (But yeah, unless you make a static executable that just makes an exit syscall, you'll have lots of user-space insns. [How do I determine the number of x86 machine instructions executed in a C program?](https://stackoverflow.com/q/54355631) – Peter Cordes Dec 02 '20 at 07:51
  • 1
    @PeterCordes Looking only at user space for a statically-linked version, fewer than 4% of the branches remain. This seems to be setting up some additional memory mappings, initializing some library objects, and some architecture-specific capability investigating and configuring. – David Schwartz Dec 02 '20 at 07:55
  • DavidSchwartz and @Tony: yup, glibc init code will allocate space for stdio buffers, pick a charset localization table for stuff like `isalpha`, and so on. The OP didn't include `` so its constructors won't run. Not surprised that kernel system calls for dynamic linking were costing a huge amount of branches (and branch misses), especially with Spectre mitigation enabled that may clear branch-prediction history on every entry into the kernel. – Peter Cordes Dec 02 '20 at 08:02
  • 1
    @PeterCordes I actually did include iostream, I just dropped it and #branches was reduced to 100K. `(With perf stat ./prog)` – Tony Tannous Dec 02 '20 at 08:05
  • 1
    @TonyTannous: Oops, you're right. Thanks for testing the difference. (Note that total instructions is maybe a more interesting metric for how much startup code runs. Branch-misses don't necessarily cost much time if there are also cache misses in flight at the same time, for example, which can continue while the branch miss is sorted out on [modern CPUs with fast branch recovery](https://stackoverflow.com/questions/50984007/what-exactly-happens-when-a-skylake-cpu-mispredicts-a-branch). They're certainly not great, but only counting branches is weird.) – Peter Cordes Dec 02 '20 at 08:10