4

Please consider the below program:

#include <stdio.h>

void my_f(int);

int main()
{
    int i = 15;
    my_f(i);
}

void my_f(int i)
{
    int j[2] = {99, 100};
    printf("%d\n", j[-2]);
}

My understanding is that the activation record (aka stack frame) for my_f() should look like this:

    ------------
    |     i    |    15
    ------------
    | Saved PC |    Address of next instruction in caller function
    ------------
    |   j[0]   |    99
    ------------
    |   j[1]   |    100
    ------------

I expected j[-2] to print 15, but it prints 0. Could someone please explain what I am missing here? I am using GCC 4.0.1 on OS X 10.5.8 (Yes, I live under a rock, but that's besides the point here).

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
babon
  • 3,615
  • 2
  • 20
  • 20
  • If you are on 64-bit, the address is 8 bytes and if int is 4 bytes then `'i` is at `-3`. This is theoretical practically this may defer. Better to use gdb and print the stack memory to be more sure. – Rohan May 15 '16 at 06:23
  • 7
    accessing out of bounds of an array is [undefined behaviour](http://stackoverflow.com/a/4105123/1505939). You should not go in with any expectations about the output of this program – M.M May 15 '16 at 06:31
  • @M.M is correct. The C standard does not state anything about the structure of activation records, and there is no way for a program to examine them without undefined behaviour. – James Youngman May 15 '16 at 07:33
  • 3
    The order of `j[0]` and `j[1]` on the stack is likely not as you expect. C guarantees that `&j[0] < &j[1]`, and the stack grows downwards. If anything, the out-of-bounds access you "want" is at `j[4]`, but as others have already noted, this is undefined anyway. – EOF May 15 '16 at 08:30
  • @EOF. I've put a copy of the Apple GCC 4,01 generated code [here](http://www.capp-sysware.com/misc/stackoverflow/37235115/program-o0-16b.s) . This is a default 32-bit build (default -O0 optimization level). If you include the saved register area (EBX in this case) and the padding that was added before the array - the value he wants would be at `j[6]`. As you say this is undefined behaviour, and relying on the stack layout of any particular compiler being the same is asking for trouble. – Michael Petch May 15 '16 at 13:39
  • 1
    Are you attempting rely on this code for a specific programming purpose, or is this just a "I want to learn what a stack frame may look like in 32-bit OSX code?" . There is a good description of the IA-32 Function Calling Conventions in the [Apple documentation](https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/LowLevelABI/130-IA-32_Function_Calling_Conventions/IA32.html) – Michael Petch May 15 '16 at 17:08

2 Answers2

6

If you ever actually want the address of your stack frame in GNU C, use
__builtin_frame_address(0)
(non-zero args attempt to backtrace up the stack to parent stack frames). This is the address of the first thing pushed by the function, i.e. a saved ebp or rbp if you compiled with -fno-omit-frame-pointer. If you want to modify the return address on the stack, you might be able to do that with an offset from __builtin_frame_address(0), but to just read it reliably use __builtin_return_address(0).


GCC keeps the stack 16byte-aligned in the usual x86 ABIs. There could easily be a gap between the return address and j[1]. In theory, it could put j[] as far down as it wanted, or optimize it away (or to a read-only static constant, since nothing writes it).

If you compiled with optimization, i probably isn't stored anywhere, and my_f(int i) is inlined into main.

Also, like @EOF said, j[-2] is two spots below the bottom of your diagram. (Low addresses are at the bottom, because the stack grows down). Also note that the diagram on wikipedia (from the link I edited into the question) is drawn with low addresses at the top. The ASCII diagram in my answer has low addresses at the bottom.

If you compiled with -O0, then there's some hope. In 64bit code (the default target for 64bit builds of gcc and clang), the calling convention passes the first 6 args in registers, so the only i in memory will be in main's stack frame.

Also, in AMD64 code, j[3] might be the upper half of the return address (or the saved %rbp), if j[] is placed below one of those with no gap. (pointers are 64bit, int is still 32 bits.) j[2], the first out-of-bounds element, would alias onto the low 32bits (aka low dword in Intel terminology, where a "word" is 16 bits.)


The best hope for this to work is in un-optimized 32bit code,

using a calling convention with no register-args. (e.g. the x86 32bit SysV ABI. See also the tag wiki).

In that case, your stack will look like:

# 32bit stack-args calling convention, unoptimized code

  higher addresses
^^^^^^^^^^^^
| argv     |
------------
| argc     |
-------------------
| main's ret addr |
-------------------
|   ...    |
|  main()'s local variables and stuff, layout decided by the compiler
|   ...    |
------------
|     i    |    # function arg
------------ <--   16B-aligned boundary for the first arg, as required in the ABI
| ret addr |
------------ <--- esp pointer on entry to the function
|saved ebp |  # because gcc -m32 -O0 uses -fno-omit-frame-pointer
------------ <--- ebp after  mov ebp, esp  (part of no-omit-frame-pointer)
  unpredictable amount of padding, up to the compiler.  (likely 0 bytes in this case)
  but actually not: clang 3.5 for example makes a copy of it's arg (`i`) here, and puts j[] right below that, so j[2] or j[5] will work
------------
|  j[1]    |
------------
|  j[0]    |
------------
|          |
vvvvvvvvvvvv   Lower addresses.  (The wikipedia diagram is upside-down, IMO: it has low addresses at the top).

It's somewhat likely that the 8 byte j array will be placed right below the value written by push ebp, with no gap. That would make j[0] 16B-aligned, although there's no requirement or guarantee that local arrays have any particular alignment. (Except that C99 variable-length arrays are 16B-aligned, in the AMD64 SysV ABI. I don't remember there being a guarantee for non-variable length arrays, but I didn't check.)

If the function saved any other call-preserved registers (like ebx) so it could use them, those saved registers would be before or after the saved ebp, above space used for locals.

j[4] might work in 32bit code, like @EOF suggested. I assume he arrived at 4 by the same reasoning I did, but forgot to mention that it only applies to 32bit code.


Looking at the asm:

Of course, at what really happens is much better than all this guessing and hand-waving.

I put your function on the Godbolt compiler explorer, with the oldest gcc version it has (4.4.7), using -xc -O0 -Wall -fverbose-asm -m32. -xc is to compile as C, not C++.

my_f:
    push    ebp     #
    mov     ebp, esp  #,
    sub     esp, 40   #,              # no idea why it reserves 40 bytes.  clang 3.5 only reserves 24
    mov     DWORD PTR [ebp-16], 99    # j[0]
    mov     DWORD PTR [ebp-12], 100   # j[1]
    mov     edx, DWORD PTR [ebp+0]    ######   This is the j[4] load
    mov     eax, OFFSET FLAT:.LC0     # put the format string address into eax
    mov     DWORD PTR [esp+4], edx    # store j[4] on the stack, to become an arg for printf
    mov     DWORD PTR [esp], eax      # store the format string
    call    printf  #
    leave
    ret

So gcc puts j at ebp-16, not the ebp-8 that I guessed. j[4] gets the saved ebp. i is at j[6], 8 more bytes up the stack.

Remember, all we've learned here is what gcc 4.4 happens to do at -O0. There's no rule that says j[6] will refer to a location that holds a copy of i on any other setup, or with different surrounding code.

If you want to learn asm from compiler output, look at the asm from -Og or -O1 at least. -O0 stores everything to memory after every statement, so it's very noisy / bloated, which makes it harder to follow. Depending on what you want to learn, -O3 is good. Obviously you have to write functions that do something with input parameters instead of compile-time constants, so they don't optimize away. See How to remove "noise" from GCC/clang assembly output? (especially the link to Matt Godbolt's CppCon2017 talk), and other links in the tag wiki.


clang 3.5.

As noted in the diagram, copies i from the arg slot to a local. Although when it calls printf, it copies from the arg slot again, not the copy inside its own stack frame.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • The Apple IA 32-bit convention (OP is using 32-bit OSX code) is documented on [Apple's site](https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/LowLevelABI/130-IA-32_Function_Calling_Conventions/IA32.html). If you wish to see the code generated by his compile I've provided the output for "-O0" for viewing [here](http://www.capp-sysware.com/misc/stackoverflow/37235115/program-o0-16b.s) . In this case (as you say) `j[6]` would have the value `15`. – Michael Petch May 15 '16 at 13:09
  • @MichaelPetch: Thanks, added to the x86 tag wiki. – Peter Cordes May 15 '16 at 13:15
  • Nice answer. Minor nitpick: On a little-endian machine like x86 in 64-bit mode, wouldn't `j[3]` be the *lower* (least significant) part of the return address if anything? – EOF May 15 '16 at 15:21
  • @EOF: I forget what stack layout I had in mind. Probably one with no frame pointer and no padding, just `j0|j1|ret-addr`. But at `-O0`, I think even 64bit code uses a frame pointer, so `j[3]` would be the high dword of the caller's `rbp` *if* `j[]` was placed right below it with no padding. I said "might be", so I'm going to let myself off on a technicality :P – Peter Cordes May 15 '16 at 15:30
  • @PeterCordes: But it would still be the *lower* dword of `%rbp` in that case, wouldn't it? The *highest* element of the array is adjacent to the *lowest* byte of the following object, which in little-endian is the least significant part. – EOF May 15 '16 at 15:33
  • @EOF: Right, and `j[2]`, the 3rd "array element", is the low dword of `%rbp`. `j[3]` is the 4th "array element", aliasing the high dword. – Peter Cordes May 15 '16 at 15:36
  • @PeterCordes: Ah, I forgot `j[2]` is the first out-of-bounds element already. Carry on, don't mind me. – EOF May 15 '16 at 15:38
0

In theory you are right but practically it depends on a lot of issues. These are e.g. the calling conventions, operating system type and version, and also on the compiler type and version. You can only tell that specifically by looking at the final disassembly of your code.

Bernhard
  • 354
  • 1
  • 6
  • 2
    In *theory* he is NOT right": in theory his program is undefined and the "theoretical" C programming langauge standard says so. In practice, for some version of some compiler, he *might* be right. He'd be better off simply printing the address of the various variables in my_f; that will help him understand how his particular compiler has placed the variables in memory (if indeed it has: the standard doesn't require it to do that for any particular variable). – Ira Baxter May 15 '16 at 08:47
  • In a classroom teaching C and function calls on a white board you will explain a stack diagram exactly like this, that's what I mean with "in theory". But as I said you won't observe this in the real world. – Bernhard May 15 '16 at 09:10
  • There's nothing wrong with a theoretical description of activation records. His program accesses j[-2]; that's simply undefined. – Ira Baxter May 15 '16 at 09:38