5

When I execute the next code

int main()
{
    char tmp[] = "hello";
    printf("%lp, %lp\n", tmp, &tmp);
    return 0;
}

I had got the same addresses. But for the next code, they will be different

int main()
{
    char *tmp = "hello";
    printf("%lp, %lp\n", tmp, &tmp);
    return 0;
}

Could you explain the memory differences between those examples?

Siddhant
  • 626
  • 1
  • 7
  • 20
Anton Golovenko
  • 634
  • 4
  • 19
  • 1
    `char tmp[] = "hello"` is an array of 6 characters initialized to `"hello\0"` (it has automatic storage duration and resides within the program stack). `char *tmp = "hello";` is a pointer initialized with the address for the String Literal `"hello\0"` that resides in readonly memory (generally within the `.rodata` section of the executable). (readonly on all but a few non-standard implementations) An array is converted to a pointer to its first element on access. – David C. Rankin Jun 07 '21 at 03:54
  • @David C. Rankin Re "*readonly on all but a few non-standard implementations*", I find it doubtful that C requires a machine to have virtual memory to have a standard implementation. Once should always consider the memory to be read-only, but I challenge the claim that the memory has to be read-only for the implementation to be standard. – ikegami Jun 07 '21 at 04:00
  • 1
    @ikegami I concede that point. The standard doesn't require a conforming implementation to create string literals in read only memory. The point I was making is most do. – David C. Rankin Jun 07 '21 at 04:06
  • At very least the C standard states modifying string literals is [undefined behaviour](https://stackoverflow.com/questions/10001202/is-modification-of-string-literals-undefined-behaviour-according-to-the-c89-stan). – Aconcagua Jun 07 '21 at 04:51
  • While legal in C you shouldn't assign string literals to non-const `char` pointers, always do `char const* ptr = "some literal";` – otherwise you almost certainly *will* run into modifying the literal at some point in the future, which is UB, as stated above. Being able to assign immutable literals to `char*` pointers is a legacy from the very first days of C where `const` did not yet exist. – Aconcagua Jun 08 '21 at 13:46

3 Answers3

5

char tmp[] = "hello"; is an array of 6 characters initialized to "hello\0" (it has automatic storage duration and resides within the program stack).

char *tmp = "hello"; is a pointer to char initialized with the address for the string literal "hello\0" that resides in readonly memory (generally within the .rodata section of the executable, readonly on all but a few implementations).

When you have char tmp[] = "hello";, as stated above, on access the array is converted to a pointer to the first element of tmp. It has type char *. When you take the address of tmp (e.g. &tmp) it will resolve to the same address, but has a completely different type. It will be a pointer-to-array-of char[6]. The formal type is char (*)[6]. And since type controls pointer arithmetic, iterating with the different types will produce different offsets when you advance the pointer. Advancing tmp will advance to the next char. Advancing with the address of tmp will advance to the beginning of the next 6-character array.

When you have char *tmp = "hello"; you have a pointer to char. When you take the address, the result is pointer-to-pointer-to char. The formal type is char ** reflecting the two levels of indirection. Advancing tmp advances to the next char. Advancing with the address of tmp advances to the next pointer.

Aconcagua
  • 24,880
  • 4
  • 34
  • 59
David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • `char tmp[]`: Advance `tmp` is unlucky wording, as arrays cannot be incremented (like `++tmp`). I wouldn't describe *'string literal `"hello\0"`'* as that would imply a literal with *two* trailing null characters. – Aconcagua Jun 07 '21 at 04:57
  • Yes, I was referring to the pointer that results from access. Obviously you cannot iterate with the array itself. The intent being `char *p = tmp;` or `char (*p)[6] = &tmp;` in the array case. Thanks for pointing that out. – David C. Rankin Jun 07 '21 at 05:01
  • So to clear up my mind, (supposing tmp1 and tmp2 in the order on your answer), from an Assembly point of view `tmp1 == &tmp1`, and `tmp2 != &tmp2`. `tmp1` and `&tmp1` are exactly the same except in C they have different types (both the same stack address); `tmp2` is the pointer to the 1st string char (which might be on the stack or on read-only data or something), and `&tmp2` is a pointer to `tmp2`, which in this case will be a stack address (because tmp2 is a local variable - or at least supposing it is). And these things are the same passed to a function as arguments. Is this correct? – Edw590 Feb 23 '22 at 15:55
  • 1
    @DADi590 - you are dead-on. That is the exact case. `tmp1` and `&tmp1` resolve to the same address `tmp2` is a **pointer to** the 1st char in the string, and `&tmp2` is the **address of that pointer**, not the address of the 1st character in the string. In other words, the address for the 1st character in the string is the **address pointed to** (e.g. held-by) `tmp2`, `&tmp2` is where that address is stored in memory. – David C. Rankin Feb 23 '22 at 22:58
3
char a[] = "hello";

and

char *a = "hello";

Get stored in different places.

char a[] = "hello"

In this case, a becomes an array(stored in the stack) of 6 characters initialized to "hello\0". It is the same as:

char a[6];
a[0] = 'h';
a[1] = 'e';
a[2] = 'l';
a[3] = 'l';
a[4] = 'o';
a[5] = '\0';

char *a = "hello"

Inspect the assembly(this is not all the assembly, only the important part):

    .file   "so.c"
    .text
    .section    .rodata
.LC0:
    .string "hello" ////Look at this part
    .text
    .globl  main
    .type   main, @function
main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movq    $.LC0, -8(%rbp)
    movl    $0, %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

See

.section    .rodata
.LC0:
    .string "hello"

This is where the string is stored. char a[] is stored in the stack while char *a is stored wherever the compiler likes. Generally in rodata.

arrowd
  • 33,231
  • 8
  • 79
  • 110
3

With

char tmp[] = "hello";

you are setting aside an array of char large enough to store the string "hello" and copying the contents of the string to that array, such that you get this in memory:

     +–––+
tmp: |'h'| tmp[0]
     +–––+
     |'e'| tmp[1]
     +–––+
     |'l'| tmp[2]
     +–––+
     |'l'| tmp[3]
     +–––+
     |'o'| tmp[4]
     +–––+
     | 0 | tmp[5]
     +–––+

There is no tmp object separate from the array elements themselves, so the address of the array (tmp) is the same as the address of its first element (tmp[0]).

With

char *tmp = "hello";

you are creating a pointer to char and initializing it with the address of the first character in the string literal "hello", such that you get this in memory:

     +–––+       +–––+
tmp: |   | ––––> |'h'| tmp[0]
     +–––+       +–––+
                 |'e'| tmp[1]
                 +–––+
                 |'l'| tmp[2]
                 +–––+
                 |'l'| tmp[3]
                 +–––+
                 |'o'| tmp[4]
                 +–––+
                 | 0 | tmp[5]
                 +–––+

In this case tmp is a separate object from the array elements, so the address of tmp is different from the address of tmp[0].

John Bode
  • 119,563
  • 19
  • 122
  • 198