In C, a string is a sequence of character values followed by a 0-valued terminator. For example, the character sequence {'H', 'e', 'l', 'l', 'o', 0}
is a string, but {'H', 'e', 'l', 'l', 'o'}
is not - that 0 terminator makes the difference.
Strings (including string literals) are stored as arrays of char
. Given the declaration
char str[] = "Hello";
you get something like
+---+
str: |'H'| str[0]
+---+
|'e'| str[1]
+---+
|'l'| str[2]
+---+
|'l'| str[3]
+---+
|'o'| str[4]
+---+
| 0 | str[5]
+---+
in memory. Note that no storage is set aside for a pointer to the first element of the array.
Under most circumstances, an expression of type "N-element array of T
" will be converted ("decay") to an expression of type "pointer to T
" and the value of the expression will be the address of the first element of the array. The exceptions to this rule are when the array expression is the operand of the sizeof
or unary &
operator, or when the expression is a string literal used to initialize an array in a declaration.
So, let's take the following code:
char str[] = "Hello";
char *ptr = "World";
printf( "%s, %s\n", str, ptr );
The string literals "Hello"
, "World"
, and "%s, %s\n"
are stored as arrays of char
such that they are allocated at program startup and available over the lifetime of the program.
"Hello"
, "World"
, "%s, %s\n"
, and str
are all array expressions (they all have type "N-element array of char
"). In the declaration of ptr
, the "World"
array expression is not the operand of the sizeof
or unary &
operators, nor is it being used to initialize an array of char
, so the expression is converted ("decays") to type "pointer to char
", and the value of the expression is the address of the first element of the array, so ptr
winds up pointing to the first character of "World"
.
Similarly, in the printf
call, the array expressions "%s, %s\n"
and str
are not the operands of the sizeof
or unary &
operators, so they too are converted to pointer expressions, and those pointer values are actually what get passed to printf
.
However, in the declaration of str
, the "Hello"
string literal is being used to initialize an array of char
, so it is not converted to a pointer expression; instead, str
is initialized with the contents of the string literal, and its size is determined by the size of the literal as well.
Here's a concrete memory map for the code above that I generated on my system:
Item Address 00 01 02 03
---- ------- -- -- -- --
"Hello" 0x400b91 48 65 6c 6c Hell
0x400b95 6f 00 30 30 o.00
"World" 0x400b60 57 6f 72 6c Worl
0x400b64 64 00 25 73 d.%s
"%s, %s\n" 0x400b66 25 73 2c 20 %s,.
0x400b6a 25 73 0a 00 %s..
str 0x7fff7cec1a50 48 65 6c 6c Hell
0x7fff7cec1a54 6f 00 00 00 o...
ptr 0x7fff7cec1a48 60 0b 40 00 `.@.
0x7fff7cec1a4c 00 00 00 00 ....
The string literal "Hello"
is stored starting at address 0x400b91
, "World"
is stored starting at address 0x400b60
, and the format string "%s, %s\n"
is stored starting at address 0x400b66
(for whatever reason, the compiler put "World"
and "%s, %s\n"
right next to each other).
The array str
is stored starting at address 0x7fff7cec1a50
, and it contains a copy of the contents of the string literal "Hello"
. The pointer ptr
is stored starting at address 0x7fff7cec1a48
and contains the address of the string literal "World"
(x86 stores multi-byte values like pointers in little-endian order).
The printf
call will receive the pointer values 0x400b66
, 0x7fff7cec1a50
, and 0x7fff7cec1a48
. The %s
conversion specifier in the format string says "print the sequence of characters starting at address and continue until I see the 0 terminator".