Arrays and pointers are completely different animals. In most contexts, an expression designating an array is treated as a pointer.
First, a little standard language (n1256):
6.3.2.1 Lvalues, arrays, and function designators
...
3 Except when it is the operand of the sizeof
operator or the unary &
operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
The string literal "this is a test" is a 15-element array of char
. In the declaration
char *string1 = "this is a test";
string1
is being declared as a pointer to
char
. Per the language above, the type of the
expression "this is a test" is converted from
char [15]
to
char *
, and the resulting pointer value is assigned to
string1
.
In the declaration
char string2[] = "this is a test";
something different happens. More standard language:
6.7.8 Initialization
...
14 An array of character type may be initialized by a character string literal, optionally
enclosed in braces. Successive characters of the character string literal (including the
terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.
...
22 If an array of unknown size is initialized, its size is determined by the largest indexed element with an explicit initializer. At the end of its initializer list, the array no longer has incomplete type.
In this case, string2
is being declared as an array of char
, its size is computed from the length of the initializer, and the contents of the string literal are copied to the array.
Here's a hypothetical memory map to illustrate what's happening:
Item Address 0x00 0x01 0x02 0x03
---- ------- ---- ---- ---- ----
no name 0x08001230 't' 'h' 'i' 's'
0x08001234 ' ' 'i' 's' ' '
0x08001238 'a' ' ' 't' 'e'
0x0800123C 's' 't' 0
...
string1 0x12340000 0x08 0x00 0x12 0x30
string2 0x12340004 't' 'h' 'i' 's'
0x12340008 ' ' 'i' 's' ' '
0x1234000C 'a' ' ' 't' 'e'
0x1234000F 's' 't' 0
String literals have static extent; that is, the memory for them is set aside at program startup and held until the program terminates. Attempting to modify the contents of a string literal invokes undefined behavior; the underlying platform may or may not allow it, and the standard places no restrictions on the compiler. It's best to act as though literals are always unwritable.
In my memory map above, the address of the string literal is set off somewhat from the addresses of string1
and string2
to illustrate this.
Anyway, you can see that string1
, having a pointer type, contains the address of the string literal. string2
, being an array type, contains a copy of the contents of the string literal.
Since the size of string2
is known at compile time, sizeof
returns the size (number of bytes) in the array.
The %i
conversion specifier is not the right one to use for expressions of type size_t
. If you're working in C99, use %zu
. In C89, you would use %lu
and cast the expression to unsigned long
:
C89: printf("%lu, %lu\n", (unsigned long) sizeof string1, (unsigned long) sizeof string2);
C99: printf("%zu, %zu\n", sizeof string1, sizeof string2);
Note that sizeof
is an operator, not a function call; when the operand is an expression that denotes an object, parentheses aren't necessary (although they don't hurt).