In C, a string is a sequence of character values followed by a 0-valued byte1 . All the library functions that deal with strings use the 0 terminator to identify the end of the string. Strings are stored as arrays of char
, but not all arrays of char
contain strings.
For example, the string "hello"
is represented as the character sequence {'h', 'e', 'l', 'l', 'o', 0}
2 To store the string, you need a 6-element array of char
- 5 characters plus the 0 terminator:
char greeting[6] = "hello";
or
char greeting[] = "hello";
In the second case, the size of the array is computed from the size of the string used to initialize it (counting the 0 terminator). In both cases, you're creating a 6-element array of char
and copying the contents of the string literal to it. Unless the array is declared at file scope (oustide of any function) or with the static
keyword, it only exists for the duration of the block in which is was declared.
The string literal "hello"
is also stored in a 6-element array of char
, but it's stored in such a way that it is allocated when the program is loaded into memory and held until the program terminates3, and is visible throughout the program. When you write
char *greeting = "hello";
you are assigning the address of the first element of the array that contains the string literal to the pointer variable greeting
.
As always, a picture is worth a thousand words. Here's a simple little program:
#include <string.h>
#include <stdio.h>
#include <ctype.h>
int main( void )
{
char greeting[] = "hello"; // greeting contains a *copy* of the string "hello";
// size is taken from the length of the string plus the
// 0 terminator
char *greetingPtr = "hello"; // greetingPtr contains the *address* of the
// string literal "hello"
printf( "size of greeting array: %zu\n", sizeof greeting );
printf( "length of greeting string: %zu\n", strlen( greeting ) );
printf( "size of greetingPtr variable: %zu\n", sizeof greetingPtr );
printf( "address of string literal \"hello\": %p\n", (void * ) "hello" );
printf( "address of greeting array: %p\n", (void * ) greeting );
printf( "address of greetingPtr: %p\n", (void * ) &greetingPtr );
printf( "content of greetingPtr: %p\n", (void * ) greetingPtr );
printf( "greeting: %s\n", greeting );
printf( "greetingPtr: %s\n", greetingPtr );
return 0;
}
And here's the output:
size of greeting array: 6
length of greeting string: 5
size of greetingPtr variable: 8
address of string literal "hello": 0x4007f8
address of greeting array: 0x7fff59079cf0
address of greetingPtr: 0x7fff59079ce8
content of greetingPtr: 0x4007f8
greeting: hello
greetingPtr: hello
Note the difference between sizeof
and strlen
- strlen
counts all the characters up to (but not including) the 0 terminator.
So here's what things look like in memory:
Item Address 0x00 0x01 0x02 0x03
---- ------- ---- ---- ---- ----
"hello" 0x4007f8 'h' 'e' 'l' 'l'
0x4007fc 'o' 0x00 ??? ???
...
greetingPtr 0x7fff59079ce8 0x00 0x00 0x00 0x00
0x7fff59879cec 0x00 0x40 0x7f 0xf8
greeting 0x7fff59079cf0 'h' 'e' 'l' 'l'
0x7fff59079cf4 'o' 0x00 ??? ???
The string literal "hello"
is stored at a vary low address (on my system, this corresponds to the .rodata
section of the executable, which is for static, constant data). The variables greeting
and greetingPtr
are stored at much higher addresses, corresponding to the stack on my system. As you can see, greetingPtr
stores the address of the string literal "hello"
, while greeting
stores a copy of the string contents.
Here's where things can get kind of confusing. Let's look at the following print statements:
printf( "greeting: %s\n", greeting );
printf( "greetingPtr: %s\n", greetingPtr );
greeting
is a 6-element array of char
, and greetingPtr
is a pointer to char
, yet we're passing them both to printf
in exactly the same way, and the string is being printed out correctly; how can that work?
Unless it is the operand of the sizeof
or unary &
operators, or is a string literal used to initialize another array in a declaration, an expression of type "N-element array of T
" will be converted ("decay") to an expression of type "pointer to T
", and the value of the expression will be the address of the first element of the array.
In the printf
call, the expression greeting
has type "6-element array of char
"; since it isn't the operand of the sizeof
or unary &
operators, it is converted ("decays") to an expression of type "pointer to char
" (char *
), and the address of the first element is actually passed to printf
. IOW, it behaves exactly like the greetingPtr
expression in the next printf
call4.
The %s
conversion specifer tells printf
that its corresponding argument has type char *
, and that it it should print out the character values starting from that address until it sees the 0 terminator.
Hope that helps a bit.
1. Often referred to as the NUL
terminator; this should not be confused with the NULL
pointer constant, which is also 0-valued but used in a different context.
2. You'll also see the terminating 0-valued byte written as '\0'
. The leading backslash "escapes" the value, so instead of being treated as the character '0'
(ASCII 48), it's treated as the value 0
(ASCII 0)).
3. In practice, space is set aside for it in the generated binary file, often in a section marked read-only; attempting to modify the contents of a string literal invokes undefined behavior.
4. This is also why the declaration of greeting
copies the string contents to the array, while the declaration of greetingPtr
copies the address of the first element of the string. The string literal "hello"
is also an array expression. In the first declaration, since it's being used to initialize another array in a declaration, the contents of the array are copied. In the second declaration, the target is a pointer, not an array, so the expression is converted from an array type to a pointer type, and the resulting pointer value is copied to the variable.