0

I have created two new function as below without any return statement.

As per int function not having return statement.

The return value in this case, depending on the exact platform, will likely be whatever random value happened to be left in the return register (e.g. EAX on x86) at the assembly level. Not explicitly returning a value is allowed, but gives an undefined value.

int test()
{
}

int caller()
{
    test();
}

Command: gcc test.c -o test -c -O2 -fno-tree-vectorize && objdump -d test -r -M intel

0000000000000000 <test>:
   0:   f3 c3                   repz ret
   2:   66 66 66 66 66 2e 0f    data32 data32 data32 data32 nop WORD PTR cs:[rax+rax*1+0x0]
   9:   1f 84 00 00 00 00 00

0000000000000010 <caller>:
  10:   f3 c3                   repz ret

Now why do I have repz ret statement incase of test function? I have gone through What does rept mean?

But what kind of branch prediction issue is here so that repz ret was needed.

Sreeraj Chundayil
  • 5,548
  • 3
  • 29
  • 68
  • you're invoking undefined behavior by not returning anything from the function. The compiler is allowed to do anything in that case – phuclv Apr 28 '20 at 14:45
  • 2
    @phuclv It's only UB if you actually try use the return value, so this particular example should be perfectly legal C code. C99 Section 6.9.1: "If the } that terminates a function is reached, **and the value of the function call is used by the caller**, the behavior is undefined." – Felix G Apr 28 '20 at 15:06
  • I don't believe thus has anything to do with the absence of a return value. I also don't believe there is any need for the `rep`. But it's probably easier for the compiler to unconditionally prefix `ret` than to try to figure out whether it might matter on the particular target CPU. – rici Apr 28 '20 at 15:18
  • The `repz` is not needed for correct function at all. It's an optimization, because on some x86 microarchitectures, branching to a branch-instruction (like `call`ing a `ret`, or `jcc`ing to another `jcc`) causes very poor performance (probably a pipeline flush or something like that, I forget the exact problem), and branching to an ignored prefix mitigates the problem. You could try explicitly compiling for various x86 microarchitectures to see which your compiler believes to be affected. – EOF Apr 28 '20 at 16:54
  • 1
    @FelixG: IIRC, C++ is different and it is straight up UB to fall off the end of a non-`void` function. That's why gcc or clang can compile such code paths to `ud2` (illegal instruction) or literally no asm instructions (so execution just falls through to the next function in the binary) when compiling as C++. – Peter Cordes Apr 28 '20 at 23:13
  • @PeterCordes: The linked question added for the closing the current question is the same which I mentioned in my queston already. – Sreeraj Chundayil Apr 29 '20 at 01:08
  • The first sentence of my answer there is *AMD K8 and K10 have a problem with branch prediction when `ret` is a branch target, or follows a conditional branch.* I forget the micro-architectural details of why that is, but it's something to do with possibly having multiple branches in 2 bytes and the BPU not being able to store that. – Peter Cordes Apr 29 '20 at 01:21
  • Found another Q&A with more microarchitectural details about why exactly K8 and Barcelona have trouble correctly predicting the `ret` if it's only 1 byte long and was reached by another branch, updated the duplicate list. – Peter Cordes Apr 29 '20 at 01:40
  • @PeterCordes: My processor is Intel Xeon why would AMD related things come here in this gcc? – Sreeraj Chundayil Apr 29 '20 at 02:13
  • Because the default is `-mtune=generic`. Compile with `-march=native` if you want to compile for your own CPU instead of a binary that will run decently anywhere (e.g. so you could distribute it as a binary package.) – Peter Cordes Apr 29 '20 at 02:14

0 Answers0