29

Can anyone clearly explain, in terms of C,C++ and Java. What all goes on stack and what all goes on Heap and when is the allocation done.

As far as I know,

All local variables whether primitives,pointers or reference variables per function call are on a new stack frame.

and anything created with new or malloc goes on heap.

I am confused about few things.

Are references/primitives which are members of an object created on heap also stored on heap ?

and what about those local members of a method that are being recursively created in each frame. Are they all on stack, If yes then is that stack memory allocated at runtime ? also for literals, are they part of the code segment ? and what about globals in C, static in C++/Java and static in C .

Amogh Talpallikar
  • 12,084
  • 13
  • 79
  • 135
  • 3
    Local variables are not necessarily in the stack frame - they may exist only in registers, or may even be optimised away completely. – Paul R Jan 02 '12 at 12:00
  • 2
    Ideally, these are all implementation details. What implementations are you comparing. – trashgod Jan 02 '12 at 12:12
  • standard implementations JDK implementation that Sun/Oracle implements. and for C/C++ I am talking about gcc. – Amogh Talpallikar Jan 02 '12 at 12:15
  • http://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap – Jeegar Patel Jan 02 '12 at 12:18
  • If you had simply typed your subject into the search box, you would have seen lots of existing answers to this question. In fact, those answers should have appeared *while* you were typing your question. – kdgregory Jan 02 '12 at 15:31
  • possible duplicate of [JVM - Heap and Stack](http://stackoverflow.com/questions/2826222/jvm-heap-and-stack) – kdgregory Jan 02 '12 at 15:31
  • Yes. I saw a lot of them but most of them either dealt with how Java implements stack n heap and some would only talk about C/C++ and that too not clearly. Doubts were still in my mind so I thought lets ask specific questions regarding the topic. – Amogh Talpallikar Jan 02 '12 at 21:13

6 Answers6

44

Structure of a Program in Memory

The following is the basic structure of any program when loaded in the memory.

 +--------------------------+
 |                          |
 |      command line        |
 |        arguments         |
 |    (argc and argv[])     |
 |                          |
 +--------------------------+
 | Stack                    |
 | (grows-downwards)        |
 |                          |
 |                          |
 |                          |
 |         F R E E          |
 |        S P A C E         |
 |                          |
 |                          |
 |                          |
 |                          |
 |     (grows upwards) Heap |
 +--------------------------+
 |                          |
 |    Initialized data      |
 |         segment          |
 |                          |
 +--------------------------+
 |                          |
 |     Initialized to       |
 |        Zero (BSS)        |
 |                          |
 +--------------------------+
 |                          |
 |      Program Code        |
 |                          |
 +--------------------------+

Few points to note:

  • Data Segment
    • Initialized data segment (initialized to explicit initializers by programmers)
    • Uninitialized data segment (initialized to zero data segment - BSS [Block Start with Symbol])
  • Code Segment
  • Stack and Heap areas

Data Segment

The data segment contains the global and static data that are explicitly initialized by the users containing the intialized values.

The other part of data segment is called BSS (because of the old IBM systems had that segment initialized to zero). It is the part of memory where the OS initializes the memory block to zeros. That is how the uninitialized global data and static get default value as zero. This area is fixed and has static size.

The data area is separated into two areas based on explicit initialization because the variables that are to be initialized can be initialized one-by-one. However, the variables that are not initialized need not be explicitly initialized with 0's one-by-one. Instead of that, the job of initializing the variable is left to the OS. This bulk initialization can greatly reduce the time required to load the executable file.

Mostly the layout of the data segment is in the control of the underlying OS, still some loaders give partial control to the users. This information may be useful in applications such as embedded systems.

This area can be addressed and accessed using pointers from the code. Auto variables have overhead in initializing the variables each time they are required and code is required to do that initialization. However, the variables in the data area does not have such runtime overload because the initialization is done only once and that too at loading time.

Code segment

The program code is the code area where the executable code is available for execution. This area is also of fixed size. This can be accessed only be function pointers and not by other data pointers. Another important information to note here is that the system may consider this area as read only memory area and any attempt to write in this area leads to undefined behavior.

Constant strings may be placed either in code or data area and that depends on the implementation.

The attempt to write to code area leads to undefined behavior. For example (I'm going to give only C based examples) the following code may result in runtime error or even crash the system.

int main()
{
    static int i;
    strcpy((char *)main,"something");
    printf("%s",main);
    if(i++==0)
    main();
}

Stack and heap areas

For execution, the program uses two major parts, the stack and heap. Stack frames are created in stack for functions and heap for dynamic memory allocation. The stack and heap are uninitialized areas. Therefore, whatever happens to be there in the memory becomes the initial (garbage) value for the objects created in that space.

Lets look at a sample program to show which variables get stored where,

int initToZero1;
static float initToZero2;
FILE * initToZero3; 
// all are stored in initialized to zero segment(BSS)

double intitialized1 = 20.0;
// stored in initialized data segment

int main()
{
    size_t (*fp)(const char *) = strlen;
    // fp is an auto variable that is allocated in stack
    // but it points to code area where code of strlen() is stored

    char *dynamic = (char *)malloc(100);
    // dynamic memory allocation, done in heap

    int stringLength;
    // this is an auto variable that is allocated in stack

    static int initToZero4; 
    // stored in BSS

    static int initialized2 = 10; 
    // stored in initialized data segment   

    strcpy(dynamic,”something”);    
    // function call, uses stack

    stringLength = fp(dynamic); 
    // again a function call 
}

Or consider a still more complex example,

// command line arguments may be stored in a separate area  
int main(int numOfArgs, char *arguments[])
{ 
    static int i;   
    // stored in BSS 

    int (*fp)(int,char **) = main;  
    // points to code segment 

    static char *str[] = {"thisFileName","arg1", "arg2",0};
    // stored in initialized data segment

    while(*arguments)
        printf("\n %s",*arguments++);

    if(!i++)
        fp(3,str);
}

Hope this helps!

Abdul Rahman
  • 2,097
  • 4
  • 28
  • 36
Sangeeth Saravanaraj
  • 16,027
  • 21
  • 69
  • 98
  • when i mmap something on program space then where it is mapped? +1 for such nice explanation – Jeegar Patel Jan 02 '12 at 12:55
  • 1
    @Mr.32 Thanks for the +1. I'm not sure about where `mmap()` belongs to in the above layout. Its a good question. I'm trying to explore the same! OTOH, I was trying to keep the description generic to all the above mentioned programming languages. I should have covered `register` and `extern` variables also! – Sangeeth Saravanaraj Jan 02 '12 at 14:50
7

In C/C++: Local variables are allocated on the current stack frame (belonging to the current function). If you statically allocate an object, the whole object is allocated on the stack, including all of its member variables. When using recursion, with each function call a new stack frame is created, and all local variables are allocated on the stack. The stack has usually fixed size which and this value is usually written in the executable binary header during compilation/linking. However this is very OS and platform specific, some OS may grow the stack dynamically when needed. Because the size of the stack is usually limited, you can run out of stack when you use deep recursion or sometimes even when without recursion when you statically allocate large objects.

The heap is usually taken as an unlimited space (only limited by the available physical/virtual memory), and you can allocate objects on the heap using malloc/new (and other heap-allocating functions). When an object is created on the heap, all of its member variables are created within it. You should see an object as a continuous area of memory (this area contains member variables and a pointer to a virtual method table), no matter where is it allocated.

Literals, constants and other "fixed" stuff is usually compiled/linked into the binary as another segment, so it's not really is the code segment. Usually you can't alloc or free anything from this segment at runtime. However this is also platform specific, it might work differently on different platforms (for example iOS Obj-C code has a lot of constant references inserted directly into the code segment, between functions).

kuba
  • 7,329
  • 1
  • 36
  • 41
3

In C and C++, at least, this is all implementation-specific. The standards do not mention "stack" or "heap".

Oliver Charlesworth
  • 267,707
  • 33
  • 569
  • 680
2

In Java, local variables can be allocate don the stack (unless optimised away)

Primitives and references in an object are on the heap (as the object is on the heap)

A stack is pre-allocated when the thread is created. It doesn't use heap space. (However creating a thread does result in a Thread Local Allocation Buffer being created which decreases the free memory quite a bit)

Unique string literals are added to the heap. primitive literals may be in the code somewhere (if not optimised away) Whether a field is static or not make no difference.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • If the local variable is an object, only the pointer is stored on the stack. The heap in Java is generally more organized. One JVM implementation maintains a "young" generation of (i.a. short-lived) objects, somewhat mirroring the stack concept, but with administration. As opposed C++ must sometimes clone stack objects. – Joop Eggen Jan 02 '12 at 12:20
  • 1
    @JoopEggen Since Java 6 Update 14 Oracle (Sun) JVM will also be able to do escape analysis and prevent objects from being created on the heap. See http://www.oracle.com/technetwork/java/javase/6u14-137039.html – Roger Lindsjö Jan 02 '12 at 13:41
  • Escape Analysis works well sometimes, however many situations you might think it should work it doesn't. ;) – Peter Lawrey Jan 02 '12 at 14:42
2

To answer the part of your question about C++ heap and stack:

I should say first that an object created without new is stored as a contiguous unit on the stack or if global in some global segment (platform specific).

For an object that is created using new on the heap its member variables are stored as one contiguous block of memory on the heap. This is the case for member variables that are primitives and embedded objects. In the case of member vars that are pointers and reference type member variables, a primitive pointer value is stored within the object. What that value points to can be stored anywhere (heap, stack, global). Anything is possible.

As for the local variables within methods of an object, they are stored on the stack, not within the objects contiguous space on the heap. The stack is usually created of a fixed size at runtime. There is one per thread. The local variables may not even consume space on the stack as they may be optimized out of it (as Paul said).The main point is that they are not on the heap just because they are member functions of an object on the heap. If they are local variables of a pointer type they could be stored on the stack and point to something on the heap or the stack!

ScrollerBlaster
  • 1,578
  • 2
  • 17
  • 21
2

Section 3.5 of the Java Virtual Machine Specification describes runtime data areas (stacks and the heap).

Neither the C nor C++ language standards specify whether something should be stored on a stack or a heap. They only define object lifetimes, visibility, and modifiability; it's up to the implementation to map those requirements to a particular platform's memory layout.

Typically, anything allocated with the *alloc functions resides on the heap, while auto variables and function parameters reside on a stack. String literals may live "somewhere else" (they must be allocated and visible over the lifetime of the program, but attempting to modify them is undefined); some platforms use a separate, read-only memory segment to store them.

Just remember that there are some truly oddball platforms out there that may not conform to the common stack-heap model.

John Bode
  • 119,563
  • 19
  • 122
  • 198