0

updated

Changed the 2nd line of assembly to the mnemonic actually being used (mflr) and added more info at the bottom.


I ran across some code (using gcc) resembling the following (paraphrased):

#define SOME_MACRO( someVar ) \
do {                          \
  __asm__ (                   \
    "    b 0f\n"              \
    "0:  mflr %0\n"           \
    : "=r"( someVar )         \
  );                          \
} while(0)

... where the b instruction (ppc) is a short jmp and mflr is getting the contents of the 'link register' -- which is similar to the program counter in some respects. I've seen this sort of thing for intel code as well (cf. the accepted answer in this question).

The branch acts as a no-op ... my question: what purpose does this serve?

I'm guessing it has something to do with branch prediction stuff, but so far I've only found people's code using this idiom while searching.


It looks like I was wrong on the branch prediction guess. mflr grabs the contents of the link register.

So, my question boils down to: why is the branch necessary.

Community
  • 1
  • 1
Brian Vandenberg
  • 4,011
  • 2
  • 37
  • 53
  • Could it be flushing the instruction prefetch queue? – Michael Apr 04 '13 at 05:02
  • Some of the purpose is in the `somethingelse`. Such code can be used for _instrumentation_ (debugging / runtime tracing). In x86, for example, `asm("call 0f\n0: pop %0\n" : "=r"(pc))` is a way to retrieve the program counter, `IP` (beware this isn't safe in 64bit mode). On ARM (and 64bit x86), the method is also often used to _embed constants_ within the code, for use with PC-relative loads. Whether there's an impact on branch prediction / pipelining depends on the CPU, so one cannot generally state "that's what it's for". – FrankH. Apr 04 '13 at 08:33
  • @FrankH. I spent some more time looking at the code; this is a great example of me not seeing the forest for the trees. You're spot on with your analysis; `somethingelse` is getting the program counter and storing it in %0. – Brian Vandenberg Apr 04 '13 at 16:03
  • If you wouldn't mind posting your response as an answer, I'll accept it. – Brian Vandenberg Apr 04 '13 at 16:33

1 Answers1

1

The interesting bits of code like this tend to happen in somethingelse. Some known purposes of such code are:

  • runtime state retrieval; In x86, for example,
    __asm__("call 0f\n0: pop %0\n" : "=r"(pc))
    is a way to retrieve the program counter (IP register - this is hidden and not directly accessible, so the fact call pushes it to the stack is used to retrieve it).
    Beware this isn't safe to use in leaf functions in 64bit mode due to the red zone - see Inline assembly that clobbers the red zone . The correct way to do it on x86_64 is
    asm("lea 0f(%%rip), %0\n0:\n" : "=r"(pc))
    which exploits the fact that PC-relative addressing is possible in 64bit mode.
  • instrumentation (debugging / runtime tracing), e.g. by putting tracing code / NOP slots in there that tracing utilities at runtime can modify to dynamically hook into the code. Solaris DTrace uses such techniques.
  • On ARM (and 64bit x86), the method is also used to embed constants within the code, for use with PC-relative loads.

Whether unconditional branches like this cause branch prediction miss penalties or other type of stalls is very CPU-dependent.

Community
  • 1
  • 1
FrankH.
  • 17,675
  • 3
  • 44
  • 63
  • Would you like to speculate on why the branch would be necessary? – Brian Vandenberg Apr 04 '13 at 17:57
  • I dug through the ppc instruction set ... `b` doesn't alter the link register, but `bl` would. So, there's three possible explanations I have right now: 1) there's some non-obvious side effect the branch produces, 2) they want the PC for code executed prior to `b 0f` -- which begs the question, why do `b 0f` at all ... and 3) the writers of this code expected `b 0f` to update LR, so the code isn't doing what they expect. – Brian Vandenberg Apr 04 '13 at 18:09