Tl;dr
In C++, the name of an array converts automatically to a pointer to its first element.
Full Answer
What gets stored in memory can vary from compiler to compiler, but let’s get one compiler to tell us, gcc 6.3.0 for x86_64. The -S
flag tells gcc to compile to human-readable, low-level assembly code. The -O
flag tells it to optimize. We can use g++ -Wall -Wextra -Wpedantic -Wconversion -std=c++14 -O -S
to compile the following file:
char arr[] = "test";
char* ptr = arr;
char* ptr2 = &arr[0];
constexpr unsigned int arr_size = sizeof(arr)/sizeof(arr[0]); // 5
char (*ptr3)[arr_size] = &arr; // A pointer to an array of arr_size chars.
char* const optimized_out = arr;
I’ll edit the output a bit to make it easier to understand. A slightly-rearranged version of the file we get from this command (which ends with .s
) is as follows:
.data
.globl arr
arr:
.ascii "test\0"
.globl ptr
.align 8
ptr:
.quad arr
.globl ptr2
.align 8
ptr2:
.quad arr
.globl ptr3
.align 8
ptr3:
.quad arr
So, what does this say? The .data
declaration means that we are declaring the contents of the data segment of the compiled code. This is for variables whose contents we can modify.
The .globl
declaration means that arr
is a symbol that can be linked with other source files. The unindented lines arr:
, ptr:
and so on are labels representing the current address. So, when we link to arr:
later, we are linking to the address, within the .data
segment, of whatever bytes we tell the assembler to put there. Those are the five ASCII characters t
, e
, s
, t
and a terminating NUL.
Similarly, ptr
is a global variable that is an address within the .data
segment. There is a new directive here, .align 8
. This means to put the pointer on an address divisible by 8. (If gcc had actually laid the file out this way, it would need to waste three extra bytes of padding between the five bytes in the array and the aligned pointer; in fact, it put arr
last so it would not need to.) On x86_64, aligned memory reads are faster than unaligned reads.
Then, a .quad
, in x86_64 assembly, is a 64-bit variable, the size of a pointer. (64 bits is four times 16 bits, and the distant ancestor of the modern 64-bit desktop CPU, the 8086, was a machine with 16-bit words. So quad stands for quadword.)
What is stored in this 64-bit memory location? The value arr:
, which is the address of the five-byte .ascii
array.
You will notice that both ptr2
and ptr3
have identical definitions in the assembly. The standard guarantees that the name of an array decays, or implicitly converts to, a pointer to the first element of the array. And the address of an array is the same as the address of its first element; there cannot be any padding before any array element.
You cannot, in C++, assign the address of a char[]
to a char*
without a reinterpret_cast
: char *this_does_not_work = &arr;
does not work. This is only because they have different types, though. The type of array is char[5]
, and the syntax to declare ptr3
as a pointer to an array of five char
objects is char (*ptr3)[5]
. In this case, for “simplicity,” I defined a symbolic constant for the size of arr
, in case the string we pass to arr
changes. The size of an array divided by the size of an element is equal to the number of elements in the array. (The standard guarantees that this is always true.)
The addresses &arr
, arr
and &arr[0]
are all guaranteed by the standard to be the same; the only difference between them is their type. You will notice that the assembly file does not actually contain any type information; this allows you to declare something like extern char* const ptr3;
in another file and have it work. GCC will store that information in the symbol table, for debugging purposes, if you also give it the -g
flag.
You will notice that there are two variables in the source file that have no corresponding assembly-language definitions, the constexpr
variable arr_size
and the const
variable optimized_out
. In fact, gcc will include both of these if you tell it not to optimize. With the -O
flag, it won’t bother to allocate memory for small constants known at compile-time; it just substitutes 5
for arr_size
or arr
for optimized_out
. It would, however, need to store a copy of these variables somewhere in memory if you ever took their address, such as &optimized_out
.
Some of this is slightly different in C than in C++.