Xcode assembler: callq %rax vs callq *(%rax)

Question

Previously (longtime ago) "callq %rax" was used to execute the subroutine at the address in rax But now the current xcode compiler do not accept this instruction callq accept only indirect call "callq *(%rax)" where rax is a pointer to the subroutine.

How to tell the assembler to recongnise the "callq %rax" direct call ?

`callq *%rax` is Intel syntax `call rax`. `callq *(%rax)` is Intel `call qword ptr [rax]`. Both are indirect calls, the first is register-indirect, the second is memory-indirect, just different addressing modes for the same opcode. Some assemblers (like GAS) will accept `call %rax` and warn you about the missing `*`, but clang simply rejects it. — Peter Cordes, Jun 24 '23 at 13:01
But "callq *(%rax)" need to load address from memory first the go to the subroutine while "callq %rax" go directly to the routine. So the first is slower than the second. NO ? — Karl Bergeron, Jun 24 '23 at 19:28
`callq %rax` isn't an instruction, that's why it doesn't work in Xcode clang. I assume you meant `callq *%rax` in your last comment. Yes, according to https://uops.info/, Sandybridge-family CPUs runs `call r64` as 2 fused-domain uops for the front-end and ROB. Same for AMD Zen family. But `call [m64]` is 3 front-end uops on Intel, and 5, 5, 6, or 4 on Zen 1, 2, 3, and 4 respectively. So on AMD, and Alder Lake E-cores, it could actually be worth it to do `mov (%rax), %rax` / `callq *%rax` instead of `callq *(%rax)`. But on SnB-family it's the same total uops. — Peter Cordes, Jun 24 '23 at 19:54
If front-end throughput isn't the bottleneck, what really matters is that it predicts correctly, and that the prediction can be checked soon enough (executing the branch uop in the back-end) to not stall later instructions. And/or to detect mispredicts promptly so it doesn't waste too many cycles on the wrong path. (Both are indirect calls.) So yes, if you can keep a function pointer in a register throughout a loop, it can be worth an extra instruction to load it outside the loop. Or if tuning for AMD or Alder-Lake E-cores, maybe load "manually" at the cost of less code density. — Peter Cordes, Jun 24 '23 at 19:58
> I assume you meant callq *%rax in your last comment No. When compiled in 2008 the "callq %rax" was accepted by Xcode. And the old application work today... but the recent compiler no longer accept this instruction. I have never suspect (in 2008) that "callq %rax" was in fact an indirect call. Thank you for your invaluable in depth informations. — Karl Bergeron, Jun 25 '23 at 11:34
Maybe you're unclear on the definition of "indirect" in this context. It means the new RIP value can't be calculated from the machine code alone, it comes from some other architectural state (from a register or memory location). The only direct call is `call some_func`. If you wrote `call %rax`, obviously you wanted to set RIP = RAX, which means you wanted an indirect call. (This is separate from the *addressing mode* used to access the new RIP, like `%rax` register-direct vs. `(%rax)` register-indirect.) — Peter Cordes, Jun 25 '23 at 12:09
I really appreciate yours clarification. Everything is now clear. Thank you for all. — Karl Bergeron, Jun 27 '23 at 11:47

Xcode assembler: callq %rax vs callq *(%rax)

0 Answers0