The C language definition does not specify how objects are to be laid out in memory, nor does it specify how arguments are to be passed to functions (the words "stack" and "heap" don't appear anywhere in the language definition itself). That is entirely a function of the compiler and the underlying platform. The answer for x86 may be different from the answer for M68K which may be different from the answer for MIPS which may be different from the answer for SPARC which may be different from the answer for an embedded controller, etc.
All the language definition specifies is lifetime of objects (when storage for an object is allocated and how long it lasts) and the linkage and visibility of identifiers (linkage controls whether multiple instances of the same identifier refer to the same object, visibility controls whether that identifier is usable at a given point).
Having said all that, almost any desktop or server system you're likely to use will have a runtime stack. Also, C was initially developed on a system with a runtime stack, and much of its behavior certainly implies a stack model. A C compiler would be a bugger to implement on a system that didn't use a runtime stack.
I also understood that the bottom of the stack correspond to the largest address, and the top to the smallest ones.
That doesn't have to be true at all. The top of the stack is simply the place something was most recently pushed. Stack elements don't even have to be consecutive in memory (such as when using a linked-list implementation of a stack). On x86, the runtime stack grows "downwards" (towards decreasing addresses), but don't assume that's universal.
Where would be pointer named "file1" in the stack compared to pointer named "buffer" ? would it be with upper in the stack (smaller address), or down (larger address) ?
First, the compiler is not required to lay out distinct objects in memory in the same order that they were declared; it may re-order those objects to minimize padding and alignment issues (struct
members must be laid out in the order declared, but there may be unused "padding" bytes between members).
Secondly, only file1
is a pointer. buffer
is an array, so space will only be allocated for the array elements themselves - no space is set aside for any pointer.
Also, I know that printf() when giving format args (like %d, or %s) will read on the stack, but in this example where will it start to read ?
It may not read arguments from the stack at all. For example, Linux on x86-64 uses the System V AMD64 ABI calling convention, which passes the first six arguments via registers.
If you're really curious how things look on a particular platform, you need to a) read up on that platform's calling conventions, and b) look at the generated machine code. Most compilers have an option to output a machine code listing. For example, we can take your program and compile it as
gcc -S file.c
which creates a file named file.s
containing the following (lightly edited) output:
.file "file.c"
.section .rodata
.LC0:
.string "rt"
.LC1:
.string "~/file.txt"
.text
.globl main
.type main, @function
main:
.LFB2:
pushq %rbp ;; save the current base (frame) pointer
.LCFI0:
movq %rsp, %rbp ;; make the stack pointer the new base pointer
.LCFI1:
subq $48, %rsp ;; allocate an additional 48 bytes on the stack
.LCFI2:
movl %edi, -36(%rbp) ;; since we use the contents of the %rdi(%edi) and %rsi(esi) registers
movq %rsi, -48(%rbp) ;; below, we need to preserve their contents on the stack frame before overwriting them
movl $.LC0, %esi ;; Write the *second* argument of fopen to esi
movl $.LC1, %edi ;; Write the *first* argument of fopen to edi
call fopen ;; arguments to fopen are passed via register, not the stack
movq %rax, -8(%rbp) ;; save the result of fopen to file1
movq $0, -32(%rbp) ;; zero out the elements of buffer (I added
movw $0, -24(%rbp) ;; an explicit initializer to your code)
movq -48(%rbp), %rax ;; copy the pointer value stored in argv to rax
addq $8, %rax ;; offset 8 bytes (giving us the address of argv[1])
movq (%rax), %rdi ;; copy the value rax points to to rdi
movl $0, %eax
call printf ;; like with fopen, arguments to printf are passed via register, not the stack
movq -8(%rbp), %rdi ;; copy file1 to rdi
call fclose ;; again, arguments are passed via register
movl $0, %eax
leave
ret
Now, this is for my specific platform, which is Linux (SLES-10) on x86-64. This does not apply to different hardware/OS combinations.
EDIT
Just realized that I left out some important stuff.
The notation N(reg) means offset N bytes from the address stored in register reg (basically, reg acts as a pointer). %rbp
is the base (frame) pointer - it basically acts as the "handle" for the current stack frame. Local variables and function arguments (assuming they are present on the stack) are accessed by offsetting from the address stored in %rbp
. On x86, local variables typically have a negative offset from %rbp
, while function arguments have a positive offset.
The memory for file1
starts at -8(%rbp)
(pointers on x86-64 are 64 bits wide, so we need 8 bytes to store it). That's fairly easy to determine based on the lines
call fopen
movq %rax, -8(%rbp)
On x86, function return values are written to %rax
or %eax
(%eax
is the lower 32 bits of %rax
). So the result of fopen
is written to %rax
, and we copy the contents of %rax
to -8(%rbp)
.
The location for buffer
is a little trickier to determine, since you don't do anything with it. I added an explicit initializer (char buffer[10] = {0};
) just to generate some instructions that access it, and those are
movq $0, -32(%rbp)
movw $0, -24(%rbp)
From this, we can determine that buffer
starts at -32(%rbp)
. There's 14 bytes of unused "padding" space between the end of buffer
and the beginning of file1
.
Again, this is how things play out on my specific system; you may see something different.