1

It's been quite some time, I have been programming in C/C++, but some areas still elude me. Perhaps I haven't been reading from well written and authoritative material.

(1) In Linux/Unix, is there a limit on how large user programs can be? Maximum size of stack a program can have? Max amount of memory in a heap a user program can use?

(2) I understand that a C executable has data section, code section & stack section. If the program is getting into many recursive calls, it would need a large amount of stack. Is this stack of predefined size or will it grow as recursion increases. In case of growth, must the address space of program also be dynamically increased? If so, won't that slow down the program?

(3) Similarly, when memory from heap is allocated to program at runtime when the program mallocs, that area of heap would need to be added to address space of program? Thus in this case also, the page table of program needs to be updated. Is my understanding correct?

(4) Why is it that 2 files (which I intend to combine to form single executable) can't have a global variable of same name. It would help to throw some light on what the object files look like.

Addition:

I am reading ISO C99 standard from http://www.open-std.org/jtc1/sc22/wg...docs/n1256.pdf. It says on page 42:

6.2.2 Linkages of identifiers 1 An identifier declared in different scopes or in the same scope more than once can be made to refer to the same object or function by a process called linkage.There are three kinds of linkage: external, internal, and none.

2 In the set of translation units and libraries that constitutes an entire program, each declaration of a particular identifier with external linkage denotes the same object or function. Within one translation unit, each declaration of an identifier with internal linkage denotes the same object or function. Each declaration of an identifier with no linkage denotes a unique entity.

3 If the declaration of a file scope identifier for an object or a function contains the storage-class specifier static,the identifier has internal linkage.

4 For an identifier declared with the storage-class specifier extern in a scope in which a prior declaration of that identifier is visible,if the prior declaration specifies internal or external linkage, the linkage of the identifier at the later declaration is the same as the linkage specified at the prior declaration. If no prior declaration is visible, or if the prior declaration specifies no linkage, then the identifier has external linkage.

5 If the declaration of an identifier for a function has no storage-class specifier,its linkage is determined exactly as if it were declared with the storage-class specifier extern.If the declaration of an identifier for an object has file scope and no storage-class specifier, its linkage is external.

After reading this it looks that if I declare a variable like say int a in 2 source files. then both have external linkage as per rule 5 and 4. and then as per rule 2, both should refer to the same object. Then why does the compiler create problem. Where in the standard it is hinted that we can't declare like this in 2 source files and this should throw compilation error.

Thanks.

Aleksi Torhamo
  • 6,452
  • 2
  • 34
  • 44
xyz
  • 8,607
  • 16
  • 66
  • 90

2 Answers2

3

In response to your questions-

  1. Most operating systems use virtual memory to have each program think it owns all of the address space. This means that usually the limit on the size of a program is the amount of physical memory in the system, minus a small amount of memory that's usually reserved for invalid (think NULL) pointers and the kernel. The maximum memory restriction is usually platform-dependent, but on 32-bit systems usually your programs can get nearly 4GB of memory and on a 64-bit system much more than that. Of course, you also have to take into account the size of your disk, which limits how much virtual memory you can have. In theory you could write a program so huge that you couldn't fit it into memory, but unless you're using an embedded device (where this really is a concern) I doubt this would ever happen.

  2. In most programming languages, including C and C++, the stack size is not fixed at compile-time and instead starts small and grows as the program runs. However, the way the stack grows usually makes this particularly cheap - to get more space, you just need to bump the stack pointer a bit. If this ever takes you into memory that currently isn't allocated for the program, the OS will usually allocate the memory for you by associating a page with the virtual address where the stack now lives, which is considerably faster than doing a heap allocation. The cost of doing this is usually negligible in the long run, so don't be discouraged from using stack memory. Interestingly, some older programming languages, namely the first incarnation or so of FORTRAN, did not have dynamic stack space, and so recursion wasn't possible. Virtually all modern languages have eliminated these restrictions.

  3. You are correct - when more heap space is needed, often the page table is adjusted to grow the heap space. Many memory allocators opt to put the majority of memory into anonymous memory-mapped files to avoid directly using heap space for this purpose, but the principle is essentially the same - the page table is updated to make room for the new memory.

  4. If you have two global variables in different files that get linked together, then both of the object files will contain symbolic links saying that they need to reference a variable with that name, and both of the object files will contain definitions saying that they provide a symbol of this name. When you try linking them together, the linker will notice that the same symbol name has been defined in two places and will report an error because it's unsure which one of them it should use as "the" instance of that global variable. To counteract this, at least in C, you can mark global variables static to give them internal linkage. This makes the symbol not globally exported, and so the generated object file can either resolve the references internally or mangle the name so that it doesn't conflict with other symbols from other files. C++ allows for this, along with the anonymous namespace feature, to achieve the same effect.

Hope this helps! If anyone spots an errors or ambiguities here, let me know and I'd be glad to correct them.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • Ok, thanks for the pointer : "However, the way the stack grows usually makes this particularly cheap - to get more space, you just need to bump the stack pointer a bit." – xyz Jun 04 '11 at 10:19
2
  1. Yes, yes, and yes. See "help ulimit" in bash or man getrlimit.

  2. The stack size is set when the program starts and cannot be increased. The address space doesn't grow as you use more stack than previously used, but memory use can grow.

    When using a "split stack" (such as in Google's Go, but work is being done to allow that in gcc and other compilers for other languages), additional memory is allocated that is not "on the stack" and the stack pointer is adjusted. This is dynamically managed as functions are called and they return.

  3. The heap may grow as required. See man sbrk for a short overview of how this happens, or look at various malloc implementations. You seem to understand the gist of it.

  4. Because, at least for C and C++, global variables can only be defined once in the entire program. Two translation units (you can think of TUs as .o files) can use a global variable of the same name, but it can only be defined once and must be declared (with the correct type) in other TUs. I don't think understanding the details of object files will help here, but understanding the details of what's called the One Definition Rule (ODR) in C++, or it's equivalent in whatever language you're using, will likely be useful.


Regarding the edit, you have probably defined an int in two TUs:

int this_is_a_definition;

You cannot do this. You should declare it in a header:

extern int this_is_a_declaration;

Then include that header where the variable is needed and define the variable in exactly one TU. Of course, if you don't want to use the same variable in different TUs, then you probably want an "internal" name, such as you get with namespace-scope static or an unnamed namespace:

static int local_to_this_TU;

namespace {
  int another_local_to_this_TU;
}
Fred Nurk
  • 13,952
  • 4
  • 37
  • 63
  • So, we have a strong disagreement about the growable stack segment. Some prooflinks needed. – ulidtko Feb 14 '11 at 06:54
  • @ulidtko: I should have been more clear, but I think you disagree with something more than what I've (hopefully) cleared up with this edit. What is it? – Fred Nurk Feb 14 '11 at 07:03
  • Thanks for your pointer on ulimit. Conceptually, I did have understanding that virtual memory concept lets a program to be as large as total virtual memory (which is more than physical memory), but this pointer fills the gap in practical understanding that what to do if a sys admin doesnt wan't to allow users to be able to make the system unusable for other users. – xyz Jun 04 '11 at 10:17