0

Intrigued by this post about UB, I've decided to start reading Jonathan Bartlett's Programming from the Ground Up in order to play around with C++ UB and see what the assembly looks like.

But while trying out things I've found something strange in a pretty simple case. Consider this code

int foo(int * p) {
    int y = 7;
    if (p)
        ++y;
    return y;
}

Its assembly is

foo(int*):
        cmpq    $1, %rdi
        movl    $7, %eax
        sbbl    $-1, %eax
        ret

(Compiler Explorer)

Now I understand that movl $7, %eax is putting the value 7 into the eax register, then one that's gonna be returned to the caller by ret. So I also understant that sbbl $-1, %eax is the instruction taking care of subtracting -1 from the content of eax and storing the result into eax itself, and that this instruction happens only if p is not null. Which leads me to assume that sbbl is making use of a hidden boolean value computed by earlier lines. The only candidate, even by the name, is cmpq $1, %rdi.

But what is that doing? From the aforementioned book I've understood that functions arguments are passed from caller to callee via the stack: the caller pushes arguments on the stack, and the callee extracts those values. But there's no such a thing here.

So is %rdi what? The register of the first (and in this case only) arugument of the function? Why is it so? Are there other registers referring to further arguments? How many? And besides, what is a good source of information on this topic?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Enlico
  • 23,259
  • 6
  • 48
  • 102
  • 3
    RDI holds the first integer/pointer arg in the x86-64 System V calling convention. The book you're reading uses 32-bit x86 assembly, where the standard calling convention is much older and less efficient, only using stack args. If you use `gcc -O3 -m32 -mregparm=3`, you'll get 32-bit code using register args. If you use `gcc -O3 -m32`, you'll get more familiar code. – Peter Cordes Nov 21 '22 at 19:38
  • 1
    https://en.wikipedia.org/wiki/X86_calling_conventions – bolov Nov 21 '22 at 19:39
  • As for how GCC is arranging to return 7 or 8 according to the `if(p)` condition, yes, `cmp $1, %rdi` sets CF if RDI was zero, otherwise clears it. So the later SBB will add 0 (`EAX -= -1 + CF=1`) or `1` (`EAX -= -1 + CF=0`). – Peter Cordes Nov 21 '22 at 19:41
  • [Explanation for GCC compiler optimisation's adverse performance effect?](https://stackoverflow.com/a/64279197) has an answer that explains the cmp/sbb trick, but it's not primarily about just that. I assume you've read the manual for `sbb`, https://www.felixcloutier.com/x86/sbb – Peter Cordes Nov 21 '22 at 19:51

1 Answers1

2

%rdi is reference to the register rdi.

In this case, it appears that the compiler is passing the first parameter in a register instead of on the stack.

Parameter passing is basically a convention: as long as the compiler is consistent in how it passes parameters, a compiler can switch from passing parameters one way (e.g., always on the stack) to another (some in registers) almost any time it sees fit (new version of the compiler, or even just passing some switch on the compiler command line).

Depending on when and where you look, it's pretty routine for a single compiler to support multiple calling conventions. For example, for quite a while Microsoft's 32-bit compiler supported four: cdecl, fastcall, stdcall, and thiscall (the last used only for C++ member functions). Of those, cdecl and stdcall were purely stack based, and fastcall and thiscall both used registers for some arguments.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • Programming From the Ground Up uses i386 assembly with AT&T syntax on Linux; it's a free book written before x86-64 was widely established (but is quite good from what I've skimmed of it). Terms like "cdecl" don't exist in the i386 System V or x86-64 System V ABIs. (The only way to get 32-bit code using anything except stack args is GCC's `__attribute__((regparm(3)))`, with the default being `regparm(0)` unless you use a command line option.) – Peter Cordes Nov 21 '22 at 19:45
  • C compilers can't switch calling conventions on a whim in a new version, that would break binary compatibility with existing libraries. (In theory they could for `static` functions, but this function isn't private in any way.) – Peter Cordes Nov 21 '22 at 19:48
  • @PeterCordes: Yes they can, and yes they have. I suppose "on a whim" is overstating things a bit, but they've certainly changed from one version to the next at times. – Jerry Coffin Nov 22 '22 at 02:45
  • GCC for x86-64 (which the OP is using) has never changed its C ABI in backwards-incompat ways, not that I know of, and is highly unlikely to in the future. Yes, you can change it with command line args, e.g. `-mpreferred-stack-boundary=3` to make it only maintain 2**3 = 8-byte stack alignment. GCC for i386 accidentally(?) started requiring 16-byte stack alignment with SSE code-gen, eventually changing the ABI to require it once the problem was discovered with libraries compiled that way existing in the wild. (And you can change calling convention there with `-mregparm=3`.) – Peter Cordes Nov 22 '22 at 05:00
  • I think some other ISAs that are more often used in embedded systems have changed ABIs, maybe because they're more often used in systems where everything can be rebuilt from source without much trouble. But I don't know about the details of those. Given your discussion of 32-bit Windows calling convention names, are you talking about MSVC changing defaults, or something? – Peter Cordes Nov 22 '22 at 05:02
  • @PeterCordes: MSVC is one example, but hardly the only one. – Jerry Coffin Nov 22 '22 at 08:54
  • `%rdi` is a 64-bit register (it once stood for "destination index" and the `r` means 64-bit register). A function written in C typically assumes the first parameter to be in this register (or a pointer to it, if larger than 64 bits.) In the past, it was often used to hold a pointer to some array you were writing to, or the destination of C's `memcpy()` or `memset()` functions, since the x86 hardware has special commands for repeatedly writing to the memory pointed to by `%rdi`. – puppydrum64 Nov 23 '22 at 15:38