2

from what I understood the stack is used in a function to stock all the local variables that are declared.

I also understood that the bottom of the stack correspond to the largest address, and the top to the smallest ones.

So, let's say I have this C program:

#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[]){
    FILE *file1 = fopen("~/file.txt", "rt");
    char buffer[10];
    printf(argv[1]);
    fclose(file1);
    return 0;
}

Where would be pointer named "file1" in the stack compared to pointer named "buffer" ? would it be with upper in the stack (smaller address), or down (larger address) ?

Also, I know that printf() when giving format args (like %d, or %s) will read on the stack, but in this example where will it start to read ?

JohnnyH
  • 499
  • 1
  • 4
  • 5
  • 15
    It is totally implementation defined. It doesn't even have to be on stack at all. C standard doesn't have a notion for stack. – Eugene Sh. Jan 11 '17 at 17:16
  • @EugeneSh. So where are the variables mentioned by the OP allocated? – nbro Jan 11 '17 at 17:18
  • JohnnyH, by the way, what's your problem of me editing your question? My edits are only made to hopefully improve the question not to customise it. – nbro Jan 11 '17 at 17:19
  • @nbro we don't know, we don't even know if they are allocated. – user3528438 Jan 11 '17 at 17:19
  • 4
    @nbro Anywhere. In register, in memory, on a sticky note on your fridge. Or even nowhere, as some of the variables doing nothing and get optimized away. – Eugene Sh. Jan 11 '17 at 17:19
  • 1
    @nbro, I think your edit got accidentally clobbered by a larger edit. – ikegami Jan 11 '17 at 17:20
  • The way to understand what your implementation is doing would be to look at an assembler output. – Weather Vane Jan 11 '17 at 17:21
  • @EugeneSh. So, are you saying there's no difference between allocating memory with malloc and declaring any other variable on the body of a function? – nbro Jan 11 '17 at 17:22
  • @nbro on some PICs for example, the hardware stack is a call stack only. Other processors have two stacks - call stack and user stack. And so on. As to, your recent question local variables have the lifetime of the function, but memory from `malloc` persists. – Weather Vane Jan 11 '17 at 17:24
  • @nbro, no, he's saying that *the standard doesn't specify* and therefore *implementations may and do differ*. – John Bollinger Jan 11 '17 at 17:25
  • Essentially a duplicate of this one http://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap?rq=1 – Malcolm McLean Jan 11 '17 at 17:26
  • 1
    @nbro Am I? No. But yes, it is possible to have a complying compiler allocating local variables the same way `malloc` does. Moreover, I've encountered a compiler allocating VLAs with `malloc` (it gave me some headache, as the platform did not have a proper `malloc` implementation)... – Eugene Sh. Jan 11 '17 at 17:26
  • @MalcolmMcLean Why should this question be a duplicate of the one you're linking to? – nbro Jan 11 '17 at 17:27
  • You just confuse newbies by saying an implementation doesn't have to have a stack. The stack is a small region of memory on which local variables and (usually) function return addresses and pushed and popped with every call. – Malcolm McLean Jan 11 '17 at 17:28
  • @MalcolmMcLean What? What's your pointing of explaining what's a stack? The point of this question is all about the memory model of the C programming language, not what's a stack or a heap, but the OP doesn't know it. – nbro Jan 11 '17 at 17:32
  • @Malcolm OK, can you answer this question, such that it will be always true? I don't think so, as any correct answer will contain "it depends on implementation". – Eugene Sh. Jan 11 '17 at 17:33
  • I guess this question can receive a meaningful answer if it's not tagged as `C` language but a (and only one) popular implementation. Like "How does the stack work in GCC's (or clang's or MSVC's) (current) C implementation (on x86-64 platforms)?". – user3528438 Jan 11 '17 at 17:33
  • @EugeneSh. My point was that in C there's a difference between allocating memory which will be freed once the stack frame of a function is destroyed (because the function returns) and allocating memory with `malloc`. Now, if the memory allocated with `malloc` is on the heap or stack of the current thread, I don't know. – nbro Jan 11 '17 at 17:37
  • [This](https://gcc.godbolt.org/) may help – Pavel Jan 11 '17 at 17:37
  • 3
    @nbro the lifetime and scope of a local variable have nothing to do with any stack, but C itself. A newbie is better informed with information that a local variable has the scope and lifetime only of the function, regardless of the implementation's storage method, since that is one of the most common faults of newbie code. – Weather Vane Jan 11 '17 at 17:41
  • So what do you guys have to say regarding these slides: http://www.cs.cornell.edu/courses/cs2022/2011sp/lectures/lect06.pdf? – nbro Jan 11 '17 at 17:45
  • @WeatherVane By the way, having worked with PintOS, I was accustomed to think that local variables of a function are really allocated on a stack, or at least in a LIFO data structure. Now, if that isn't always the case, could you please point me to the official documentation where it's said that C has no notion of stack and heap? – nbro Jan 11 '17 at 17:48
  • 3
    @nbro no, the standard does not say "C does/not use a heap/stack". In fact the word "stack" does not appear. No mention of greenfly or toads either. It only dictates *what* will happen, and not *how*. – Weather Vane Jan 11 '17 at 17:54
  • @nbro http://port70.net/~nsz/c/c11/n1570.html you can do a ctrl+f and find nothing about stack or heap – user3528438 Jan 11 '17 at 17:54
  • Is it possible that someone make a post with all the considerations you talked about, or maybe gives an answer for a casual "case", like running this program on Ubuntu 14 – JohnnyH Jan 11 '17 at 17:59
  • @JohnnyH that's not my system, but comments say it is a function of the compiler you are using, on that system. – Weather Vane Jan 11 '17 at 18:00
  • 1
    @nbro, with respect to what your slide deck has to say about stack and heap: it describes characteristics of *one possible way* to implement C. To the extent that it implies that all C implementations must use a stack as it describes, it is both wrong in theory and inconsistent with some real, conforming implementations. Many implementations are structured similarly to the slides' description, but it is common for the details to vary. – John Bollinger Jan 11 '17 at 18:33
  • @JohnBollinger It's not my slide deck, I found it by googling. Could you please pointing me to an implementation of C which doesn't use stacks. I'm curious to know which other concepts and tools can be used to implement something that really is asking for a stack. – nbro Jan 11 '17 at 18:51
  • By the way guys, [this](http://stackoverflow.com/a/79936/3924118) answer (with more than 1000 upvotes) to the post someone in this comment section has mentioned says: "In C, variables on the _heap_ must be destroyed manually and never fall out of scope. The data is freed with delete, delete[], or free". It seems for me that a lot of people don't know what they are doing. – nbro Jan 11 '17 at 19:27
  • @nbro As I said previously, I am working with `armcc` compiler, and at some point I wanted to use VLA, assuming it will use the stack to allocate. And I was really surprised it was attempting to use `malloc` internally instead. As a consequence the same implementation would perform `free` at the function exit.. – Eugene Sh. Jan 11 '17 at 19:36
  • So, in today's world, there is ~1% of compilers and architectures which won't store local variables on stack and registers. But it's implementation dependent, so there is no point in explaining to the OP how the remaining 99% compilers operate. – Lou Jan 11 '17 at 19:39
  • @Lousy Even if 100% are operating with stack, the specific stack usage is varying much more than the above distribution. – Eugene Sh. Jan 11 '17 at 19:41
  • @EugeneSh.: I am only saying that, IMHO, understanding how stack works and looking at exact emitted assembly of different compilers has only helped me understand each architecture *better*. And there is a practical difference in malloc'ing a huge array on the heap, or simply creating a huge local temporary array and risk running out of stack. – Lou Jan 11 '17 at 19:49
  • @JohnBollinger I've used tiny C compilers for small embedded targets which don't use stacks. Local variables are simply mapped to a global space, and recursion is prohibited. But most hosted C implementations have a stack. – Malcolm McLean Jan 11 '17 at 21:20

3 Answers3

2

Wiki article:

http://en.wikipedia.org/wiki/Stack_(abstract_data_type)

The wiki article makes an analogy to a stack of objects, where the top of the stack is the only object you can see (peek) or remove (pop), and where you would add (push) another object onto.

For a typical implementation of a stack, the stack starts at some address and the address decreases as elements are pushed onto the stack. A push typically decrements the stack pointer before storing an element onto the stack, and a pop typically loads an element from the stack and increments the stack pointer after.

However, a stack could also grow upwards, where a push stores an element then increments the stack pointer after, and a pop would decrement the stack pointer before, then load an element from the stack. This is a common way to implement a software stack using an array, where the stack pointer could be a pointer or an index.

Back to the original question, there's no rule on the ordering of local variables on a stack. Typically the total size of all local variables is subtracted from the stack pointer, and the local variables are accessed as offsets from the stack pointer (or a register copy of the stack pointer, such as bp, ebp, or rbp in the case of a X86 processor).

rcgldr
  • 27,407
  • 3
  • 36
  • 61
2

The C language definition does not specify how objects are to be laid out in memory, nor does it specify how arguments are to be passed to functions (the words "stack" and "heap" don't appear anywhere in the language definition itself). That is entirely a function of the compiler and the underlying platform. The answer for x86 may be different from the answer for M68K which may be different from the answer for MIPS which may be different from the answer for SPARC which may be different from the answer for an embedded controller, etc.

All the language definition specifies is lifetime of objects (when storage for an object is allocated and how long it lasts) and the linkage and visibility of identifiers (linkage controls whether multiple instances of the same identifier refer to the same object, visibility controls whether that identifier is usable at a given point).

Having said all that, almost any desktop or server system you're likely to use will have a runtime stack. Also, C was initially developed on a system with a runtime stack, and much of its behavior certainly implies a stack model. A C compiler would be a bugger to implement on a system that didn't use a runtime stack.

I also understood that the bottom of the stack correspond to the largest address, and the top to the smallest ones.

That doesn't have to be true at all. The top of the stack is simply the place something was most recently pushed. Stack elements don't even have to be consecutive in memory (such as when using a linked-list implementation of a stack). On x86, the runtime stack grows "downwards" (towards decreasing addresses), but don't assume that's universal.

Where would be pointer named "file1" in the stack compared to pointer named "buffer" ? would it be with upper in the stack (smaller address), or down (larger address) ?

First, the compiler is not required to lay out distinct objects in memory in the same order that they were declared; it may re-order those objects to minimize padding and alignment issues (struct members must be laid out in the order declared, but there may be unused "padding" bytes between members).

Secondly, only file1 is a pointer. buffer is an array, so space will only be allocated for the array elements themselves - no space is set aside for any pointer.

Also, I know that printf() when giving format args (like %d, or %s) will read on the stack, but in this example where will it start to read ?

It may not read arguments from the stack at all. For example, Linux on x86-64 uses the System V AMD64 ABI calling convention, which passes the first six arguments via registers.

If you're really curious how things look on a particular platform, you need to a) read up on that platform's calling conventions, and b) look at the generated machine code. Most compilers have an option to output a machine code listing. For example, we can take your program and compile it as

gcc -S file.c

which creates a file named file.s containing the following (lightly edited) output:

        .file   "file.c"
        .section        .rodata
.LC0:
        .string "rt"
.LC1:
        .string "~/file.txt"
        .text
.globl main
        .type   main, @function
main:
.LFB2:
        pushq   %rbp                 ;; save the current base (frame) pointer
.LCFI0:
        movq    %rsp, %rbp           ;; make the stack pointer the new base pointer
.LCFI1:
        subq    $48, %rsp            ;; allocate an additional 48 bytes on the stack
.LCFI2:
        movl    %edi, -36(%rbp)      ;; since we use the contents of the %rdi(%edi) and %rsi(esi) registers
        movq    %rsi, -48(%rbp)      ;; below, we need to preserve their contents on the stack frame before overwriting them
        movl    $.LC0, %esi          ;; Write the *second* argument of fopen to esi
        movl    $.LC1, %edi          ;; Write the *first* argument of fopen to edi
        call    fopen                ;; arguments to fopen are passed via register, not the stack
        movq    %rax, -8(%rbp)       ;; save the result of fopen to file1
        movq    $0, -32(%rbp)        ;; zero out the elements of buffer (I added
        movw    $0, -24(%rbp)        ;; an explicit initializer to your code)
        movq    -48(%rbp), %rax      ;; copy the pointer value stored in argv to rax
        addq    $8, %rax             ;; offset 8 bytes (giving us the address of argv[1])
        movq    (%rax), %rdi         ;; copy the value rax points to to rdi
        movl    $0, %eax             
        call    printf               ;; like with fopen, arguments to printf are passed via register, not the stack
        movq    -8(%rbp), %rdi       ;; copy file1 to rdi
        call    fclose               ;; again, arguments are passed via register
        movl    $0, %eax
        leave
        ret

Now, this is for my specific platform, which is Linux (SLES-10) on x86-64. This does not apply to different hardware/OS combinations.

EDIT

Just realized that I left out some important stuff.

The notation N(reg) means offset N bytes from the address stored in register reg (basically, reg acts as a pointer). %rbp is the base (frame) pointer - it basically acts as the "handle" for the current stack frame. Local variables and function arguments (assuming they are present on the stack) are accessed by offsetting from the address stored in %rbp. On x86, local variables typically have a negative offset from %rbp, while function arguments have a positive offset.

The memory for file1 starts at -8(%rbp) (pointers on x86-64 are 64 bits wide, so we need 8 bytes to store it). That's fairly easy to determine based on the lines

    call    fopen                
    movq    %rax, -8(%rbp)       

On x86, function return values are written to %rax or %eax (%eax is the lower 32 bits of %rax). So the result of fopen is written to %rax, and we copy the contents of %rax to -8(%rbp).

The location for buffer is a little trickier to determine, since you don't do anything with it. I added an explicit initializer (char buffer[10] = {0};) just to generate some instructions that access it, and those are

        movq    $0, -32(%rbp)       
        movw    $0, -24(%rbp)       

From this, we can determine that buffer starts at -32(%rbp). There's 14 bytes of unused "padding" space between the end of buffer and the beginning of file1.

Again, this is how things play out on my specific system; you may see something different.

John Bode
  • 119,563
  • 19
  • 122
  • 198
-2

Very implementation dependent but still nearby. In faxt this is very crucial to setting up buffer overflow based attacks.

vpathak
  • 1,133
  • 12
  • 12
  • Why is this answer here, given we have two comprehensive answers and a long list of comments providing much more information each? – Eugene Sh. Jan 11 '17 at 21:16
  • both the answers above are pedantic in nature - talking in terms of data structures and standards - some information you can get for yourself in an undergraduate text book. however they just ignore the fact that this organization of the stack is the main means setting up a very important class of hacking attacks on modern systems. but if this aspect must be hidden - i will wait for a few more down votes and remove it. thanks – vpathak Jan 12 '17 at 10:51