0

Let's say that

char arr[] = "test";

I read that arrays acts like a pointer to the string. Therefore, when I do:

cout << arr << endl;

I get test. When I do

char *ptr = arr

the variable ptr should now store the address of the pointer of arr. However, if I do

cout << ptr << endl

I get test. If it is basically a pointer to a pointer, why isn't it this to get "test":

cout << *ptr << endl; 

Can someone explain it to me in terms of how the memory is allocated?

phuclv
  • 37,963
  • 15
  • 156
  • 475
Danny Brown
  • 304
  • 1
  • 3
  • 15
  • 1
    There is no such thing as a "pointer of arr". – Kerrek SB Apr 17 '17 at 23:19
  • When we do char *ptr = arr, what gets stored in ptr? – Danny Brown Apr 17 '17 at 23:21
  • 3
    When you do `char *ptr = arr;` you *don't* store the "address of the pointer of arr". `arr` decays to a pointer to `arr[0]`, and that value is copied to `ptr` so it also points to `arr[0]`. Also arrays aren't pointers, it's just that in many situations the array name will give you the address of the first element. – Dmitri Apr 17 '17 at 23:21
  • But when we do cout << ptr << endl, how come it points to the values of the array and not the address of the array? – Danny Brown Apr 17 '17 at 23:30
  • 1
    Possible duplicate of [What is array decaying?](http://stackoverflow.com/questions/1461432/what-is-array-decaying) – user4581301 Apr 17 '17 at 23:33
  • You didn't take the address of the array... it just decayed to the address of its first element. That, coincidentally, is also the address of the array... but `arr` decays to "pointer to `char`" and `&arr` would be "pointer to array of 5 `char`s". The array is the group of `char`s in memory somewhere, not a pointer to them. – Dmitri Apr 17 '17 at 23:34
  • @DannyBrown: A pointer to the first element of the array. – Kerrek SB Apr 17 '17 at 23:37
  • When you use `<<` with `cout` like this, it treats a `char *` as a C string and prints the characters it points to rather than the value of the pointer itself. – Dmitri Apr 17 '17 at 23:41
  • Array decay into pointers once passed to functions, that's why sizeof does not work! – Felipe Lopez Apr 18 '17 at 00:11
  • "I read that arrays acts like a pointer to the string." - throw away that reading and get a good and modern textbook! It is wrong. – too honest for this site Apr 18 '17 at 01:08
  • If you want to display the pointer value, `cout << static_cast( ptr )`. – paddy Apr 18 '17 at 02:00

3 Answers3

0

What happens is that the "<<" operator sees char * as a C-string. Therefore it implies arr[0-end] and ptr[0-end]; When you do:

char *ptr = arr

you simply make a new char * to the same target. Therefore it is also treated as a C-string and implicitly "cout <<" prints the characters it points to.

This applies only to char *; an input of:

int a[] = {1, 2}; cout << a << endl;

simply prints the address of the a array (address of first element).

Kostas
  • 4,061
  • 1
  • 14
  • 32
0
char arr[] = "test";

array is declared and initialised to "test", and let base address is 1000

char *ptr = arr

a character type pointer is declared and initialised to same base address 1000

cout << ptr << endl

prints whole string "test" because 1000 is passed but *ptr gives value at 1000, and ptr is char type so...

cout << *ptr << endl; 

first char 't' is printed which is present on address 1000.

0

Tl;dr

In C++, the name of an array converts automatically to a pointer to its first element.

Full Answer

What gets stored in memory can vary from compiler to compiler, but let’s get one compiler to tell us, gcc 6.3.0 for x86_64. The -S flag tells gcc to compile to human-readable, low-level assembly code. The -O flag tells it to optimize. We can use g++ -Wall -Wextra -Wpedantic -Wconversion -std=c++14 -O -S to compile the following file:

char arr[] = "test";
char* ptr = arr;
char* ptr2 = &arr[0];
constexpr unsigned int arr_size = sizeof(arr)/sizeof(arr[0]); // 5
char (*ptr3)[arr_size] = &arr; // A pointer to an array of arr_size chars.
char* const optimized_out = arr;

I’ll edit the output a bit to make it easier to understand. A slightly-rearranged version of the file we get from this command (which ends with .s) is as follows:

        .data


        .globl  arr
arr:
        .ascii "test\0"


        .globl  ptr
        .align 8
ptr:
        .quad   arr


        .globl  ptr2
        .align 8
ptr2:
        .quad   arr


        .globl  ptr3
        .align 8
ptr3:
        .quad   arr

So, what does this say? The .data declaration means that we are declaring the contents of the data segment of the compiled code. This is for variables whose contents we can modify.

The .globl declaration means that arr is a symbol that can be linked with other source files. The unindented lines arr:, ptr: and so on are labels representing the current address. So, when we link to arr: later, we are linking to the address, within the .data segment, of whatever bytes we tell the assembler to put there. Those are the five ASCII characters t, e, s, t and a terminating NUL.

Similarly, ptr is a global variable that is an address within the .data segment. There is a new directive here, .align 8. This means to put the pointer on an address divisible by 8. (If gcc had actually laid the file out this way, it would need to waste three extra bytes of padding between the five bytes in the array and the aligned pointer; in fact, it put arr last so it would not need to.) On x86_64, aligned memory reads are faster than unaligned reads.

Then, a .quad, in x86_64 assembly, is a 64-bit variable, the size of a pointer. (64 bits is four times 16 bits, and the distant ancestor of the modern 64-bit desktop CPU, the 8086, was a machine with 16-bit words. So quad stands for quadword.)

What is stored in this 64-bit memory location? The value arr:, which is the address of the five-byte .ascii array.

You will notice that both ptr2 and ptr3 have identical definitions in the assembly. The standard guarantees that the name of an array decays, or implicitly converts to, a pointer to the first element of the array. And the address of an array is the same as the address of its first element; there cannot be any padding before any array element.

You cannot, in C++, assign the address of a char[] to a char* without a reinterpret_cast: char *this_does_not_work = &arr; does not work. This is only because they have different types, though. The type of array is char[5], and the syntax to declare ptr3 as a pointer to an array of five char objects is char (*ptr3)[5]. In this case, for “simplicity,” I defined a symbolic constant for the size of arr, in case the string we pass to arr changes. The size of an array divided by the size of an element is equal to the number of elements in the array. (The standard guarantees that this is always true.)

The addresses &arr, arr and &arr[0] are all guaranteed by the standard to be the same; the only difference between them is their type. You will notice that the assembly file does not actually contain any type information; this allows you to declare something like extern char* const ptr3; in another file and have it work. GCC will store that information in the symbol table, for debugging purposes, if you also give it the -g flag.

You will notice that there are two variables in the source file that have no corresponding assembly-language definitions, the constexpr variable arr_size and the const variable optimized_out. In fact, gcc will include both of these if you tell it not to optimize. With the -O flag, it won’t bother to allocate memory for small constants known at compile-time; it just substitutes 5 for arr_size or arr for optimized_out. It would, however, need to store a copy of these variables somewhere in memory if you ever took their address, such as &optimized_out.

Some of this is slightly different in C than in C++.

Davislor
  • 14,674
  • 2
  • 34
  • 49