2

What happens behind the scenes when i declare the following:-

char a[]="June 14";

and

char *a="June 14";

I mean how memory is allocated for the above two declarations like stack , data segment whatever.

chanzerre
  • 2,409
  • 5
  • 20
  • 28
  • Are these declarations made at file scope (outside any function) or block scope (inside a function)? – Eric Postpischil Aug 30 '13 at 17:13
  • 1
    The former has two copies of the string in your process; one in a read-only data segment, the other (`a[]`) is a writable *copy* of that in whatever scope `a` is declared. The latter has one copy of the string, and its in a read-only segment, pointed to by `a`. – WhozCraig Aug 30 '13 at 17:15
  • @WhozCraig: It depends on where they appear. If they are declared at file-scope, then the former requires only one copy of the string. – Eric Postpischil Aug 30 '13 at 17:17
  • @EricPostpischil Interesting. I've never seen constant literals anywhere besides read-only segments in any remotely modern C/C++ compilation (whether they *need* to be or not). I'll have to debug rt-lib startup code to see what happens with that initialization. I'm curious now. – WhozCraig Aug 30 '13 at 17:22
  • @WhozCraig: In `char []="June 14";` at file scope the string is not constant; it is just the initial data for the array and would typically appear in a data segment (not a read-only data segment). At block scope, the compiler would typically place a copy of the string in a read-only data segment and would use that copy each time execution reaches the declaration to initialize a newly created array on the stack. – Eric Postpischil Aug 30 '13 at 17:26
  • @EricPostpischil I just debugged the assembly and you're absolutely right! Smack in the writable data section is `_a` declared as `.asciz "June 14"` . It makes sense, I just never knew it would do that. Thank you! – WhozCraig Aug 30 '13 at 17:29
  • This question is a duplicate hundreds of times over, right? – Carl Norum Aug 30 '13 at 17:57
  • @CarlNorum: probably — can you find the duplicate accurately and quickly, though? It is usually quicker to write the answer than to find the duplicate. – Jonathan Leffler Aug 30 '13 at 18:09
  • @CarlNorum I could not find the question that is an exact replica of mine, so i just asked it. – chanzerre Aug 30 '13 at 18:14
  • @JonathanLeffler - yeah, I try to favourite some of these canonical questions so they're easy to find, but this is one I don't have on hand. We need a better dupe-finding system, I guess. – Carl Norum Aug 30 '13 at 18:31
  • read [Difference between `char *str` and `char str[]` and how both stores in memory?](http://stackoverflow.com/questions/15177420/what-does-sizeofarray-return/15177499#15177499) – Grijesh Chauhan Aug 31 '13 at 06:13

4 Answers4

4

The following is for typical C implementations.

At file scope:

char a[] = "June 14"; defines an array that initially contains “June 14” (including a terminating null). These characters are placed into a data section of the executable file that is loaded into memory either when the program begins execution. (Note: The loading may be virtual, in the sense that it is effectively a part of the program’s virtual address space even though it is not physically read into memory immediately.)

char *a = "June 14"; defines two things. The first is a string containing “June 14”. These characters are placed into a data section of the executable file. It may be a read-only (also called constant) data section. The second is a pointer. The pointer is also likely placed in a data section (not read-only). The pointer is initialized with the address of the string. (Depending upon how programs are linked on the target system, the executable file might contain instructions to the system on how to fill in the address at load time rather than containing the actual address itself.)

At block scope

char a[] = "June 14"; defines an array that initially contains “June 14”. However, this array must be created and initialized each time execution reaches the declaration. (In the file scope declaration, the array only needs to be initialized when the program starts.) In order to accomplish this initialization, the compiler puts “June 14” in a data section of the executable (again possibly a read-only data section). Whenever execution reaches the declaration (or is about to), the compiler allocates some space on the stack and copies the characters from the read-only copy into the new space on the stack.

char *a = "June 14"; is similar to the file-scope declaration, in that the string is placed into a data section (possibly read-only) and a pointer is created. However, the pointer is nominally on the stack instead of in a data section. (I say “nominally” because it is quite likely that optimization results in a pointer such as this being mostly in a processor register rather than actually on the stack, or even optimized away altogether.) The pointer is often initialized via an instruction that calculates the address of the string from other information. (E.g., the loader may set a register to the address where it loaded the start of the data section, so the calculation of the initial value of the pointer would take that address and add to it the offset of the string within the data section.)

Community
  • 1
  • 1
Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
3

Consider the code:

#include <stdio.h>

char a1[] = "June 14";
const char *a2 = "June 14";

void function(void)
{
    char b1[] = "June 14";
    const char *b2 = "June 14";
    printf("%-7s : %-7s : %-7s : %-7s\n", a1, a2, b1, b2);
    a1[6]++;
    b1[6]++;
    a2++;
    b2++;
    printf("%-7s : %-7s : %-7s : %-7s\n", a1, a2, b1, b2);
}

int main(void)
{
    function();
    function();
    return(0);
}

The array a1 is allocated space and initialized as the program is loaded with the string. The pointer a2 is allocated space for the pointer and that pointer is initialized to point to a copy of the string. I used const because you can't modify string literals; the value might be stored in the text segment of the program (which is non-writable).

The array b1 is allocated space on the stack and when the function is called, it is initialized with the string. This means, as WhozCraig noted in a comment, that there must be a copy of the string somewhere that is used to initialize the array each time the function is called. The pointer b2 is allocated space for the pointer and that pointer is initialized to point to a copy of the string. Indeed, the compiler might make a2 and b2 point to the same string.

The function prints the values of the four strings; it then modifies the last digit of the number in the two arrays, and increments the two pointers to point to the next character in the string constants, and prints the four strings again. The second call to the function resets b1 to the original string ("June 14"), but a1 remains changed to "June 15" at the start of the second call. Similarly, a2 remains pointing at the u for the first print but at the n for the second, while b2 points to J first and u second. Thus the output is:

June 14 : June 14 : June 14 : June 14
June 15 : une 14  : June 15 : une 14 
June 15 : une 14  : June 14 : June 14
June 16 : ne 14   : June 15 : une 14 

If there was a static array or a static pointer inside the function, they would behave analogously to the external array a1 and pointer a2.

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
0

There are two things that space is being allocated for here.

char a[]="June 14";
char *a="June 14";

For the object "a" (pointer) and for the constant data "June 14".

Space for the object is allocated on stack if "a" is defined in the context of a function --OR-- in the data segment, if "a" is defined in the static context.

Space for the constant data is allocated in the .rodata (read only data) section.

Ziffusion
  • 8,779
  • 4
  • 29
  • 57
0

For,

char a[]="June 14";

gcc on Linux allocates memory in the stack area. Following transformations happens,

  1. allocate memory for a in stack for size of ("June 14") + 1
  2. Then copies the "June 14" into the memory area when we call the function

For,

char *b="June 14";

gcc on Linux allocates the memory in the .rodata and stores the "June 14" string in the .rodata section. Then variable b is allocated and it holds the address of the memory area of "June 14" string which is in the .rodata section.

Look at the sample C program for the same,

 #include<stdio.h>

 main()
 {
   char a[]="Test Message"; //memory is allocated in the stack and stores "Test Message"
   char *b="Test Const String"; //Memory allocated in .rodata section and b points to the memory address
 }

I have compiled the above code with gcc with -c options and analyzed the obj file using objdump command. Following is the output, constPointer.o: file format elf32-i386

 Contents of section .text:
 0000 8d4c2404 83e4f0ff 71fc5589 e55183ec  .L$.....q.U..Q..
 **0010 24c745eb 54657374 c745ef20 4d6573c7  $.E.Test.E. Mes.
 0020 45f37361 6765c645 f700c745 f8000000  E.sage.E...E....**
 0030 0083c424 595d8d61 fcc3               ...$Y].a..
 Contents of section .rodata:
 **0000 54657374 20436f6e 73742053 7472696e  Test Const Strin
 0010 6700                                 g.**
 Contents of section .comment:
 0000 00474343 3a202847 4e552920 342e332e  .GCC: (GNU) 4.3.
 0010 30203230 30383034 32382028 52656420  0 20080428 (Red
 0020 48617420 342e332e 302d3829 00        Hat 4.3.0-8).

I have highlighted(using **) memory allocation done in the different section. In that "Test const String" is placed in the .rodata section which is not modifiable.

So, a[3]='t'; will compile without any error or warning, but b[2]='t'; will result in run time error.

Hope this helps.

Saravanan
  • 1,270
  • 1
  • 8
  • 15