15
int a[] = {1, 2 ,3};

I understand that array names are converted to pointers. A term often used is that they decay to pointers.

However to me, a pointer is a region of memory that holds the address to another region of memory, so:

int *p = a;

can be drawn like this:

-----              -----
  p    --------->  a[0].  .....
-----              -----
 0x1                0x9

But a itself is not pointing to another region of memory, it IS the region of memory itself. So when the compiler converts it to a pointer, does it save it (like p) somewhere in memory or it's an implicit conversion?

  • 2
    After `int *p = a`, `p` points to the first element of the array `a`. And if you write just `a`, `a` decays to a pointer to the first element of `a` – Jabberwocky Jun 12 '20 at 13:43
  • right, so if `a` decays to first element of `a`, is it stored like a pointer somewhere in memory? like p is? –  Jun 12 '20 at 13:45
  • 2
    `a` is not stored. It's just a "trick" of the compiler. `a` is just shorthand for `&a[0]`. – Jabberwocky Jun 12 '20 at 13:48
  • Spring boarding off of Jabberwoky, `a` is not stored, although as a symbol representing the location of the first element in `a[]`, it is `set-aside` and _designated_ by the OS as the place in memory that `a` starts. – ryyker Jun 12 '20 at 13:51
  • 1
    Does this answer your question? [In C, are arrays pointers or used as pointers?](https://stackoverflow.com/questions/4607128/in-c-are-arrays-pointers-or-used-as-pointers) – Adam Jun 12 '20 at 13:57
  • 1
    Basically the "array name used as pointer" is replaced with a hard-coded address during compilation, since the compiler knows the address of all variables. Similarly, no variable names exist in machine code, because they are all replaced with registers, stack allocations or access to addresses. – Lundin Jun 12 '20 at 14:11
  • 1
    I like to think of it by analogy. `1` is an `int` value, but it has no address. `&i` is a pointer value, but it has no address. Decay of an array name produces a pointer value, but the pointer value has no address. – Ian Abbott Jun 12 '20 at 14:55
  • 1
    @IanAbbott I like this: An array **has** an address, in some contexts that address is used directly, and that address is always valid as long as the array exists. A pointer **contains** an address that may or may not be valid. I really hate the use of the word "pointer" when describing array decay as common usage such as "arrays decay to pointers" can be misleading in some cases, especially for newcomers to C. NB the standard uses the words "pointer type" and not just "pointer". – Andrew Henle Jun 12 '20 at 15:07

4 Answers4

12

C has objects and values.

A value is an abstract concept—it is some meaning, often mathematical. Numbers have values like 4, 19.5, or −3. Addresses have values that are locations in memory. Structures have values that are the values of their members considered as an aggregate.

Values can be used in expressions, such as 3 + 4*5. When values are used in expressions, they do not have any memory locations in the computing model that C uses. This includes values that are addresses, such as &x in &x + 3.

Objects are regions of memory whose contents can represent values. The declaration int *p = &x defines p to be an object. Memory is reserved for it, and it is assigned the value &x.

For an array declared with int a[10], a is an object; it is all the memory reserved for 10 int elements.

When a is used in an expression, other than as the operand of sizeof or unary &, the a used in the expression is automatically converted to the address of its first element, &a[0]. This is a value. No memory is reserved for it; it is not an object. It may be used in expressions as a value without any memory ever being reserved for it. Note that the actual a is not converted in any way; when we say a is converted to a pointer, we mean only that an address is produced for use in the expression.

All of the above describes semantics in the computing model C uses, which is that of some abstract computer. In practice, when a compiler works with expressions, it often uses processor registers to manipulate the values in those expressions. Processor registers are a form of memory (they are things in a device that retain values), but they are not the “main memory” we often mean when we speak of “memory” without qualification. However, a compiler may also not have the values in any memory at all because it calculates the expression in part or in full during compilation, so the expression that is actually computed when the program is executing might not include all the values that are nominally in the expression as it is written in C. And a compiler might also have the values in main memory because computing a complicated expression might overflow what is feasible in the processor registers, so that parts of the expression have to be temporarily stored in main memory (often on a hardware stack).

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Technically, `a` is not the object itself, it is an identifier that designates the object. – Ian Abbott Jun 12 '20 at 15:28
  • 2
    @IanAbbott: Technically, `a` is the object and “`a`” is the identifier, the same way dogs are animals and “dogs” is a word. We write “Dogs have four legs,” not “The things called dogs have four legs.” That is, when we use a name in a sentence, the name generally refers to the thing named; it is not a reference to the name as a sequence of characters. – Eric Postpischil Jun 12 '20 at 15:34
  • I concede the point. In some places the standard talks about identifiers that are "declared to be" objects, although in other places it talks about identifiers that "designate" objects. – Ian Abbott Jun 13 '20 at 17:46
  • @IanAbbott: The primary problem here is using English (or other natural language) for technical specifications. – Eric Postpischil Jun 13 '20 at 17:48
  • indeed. I do prefer the "designates" terminology to the "declared to be" terminology. The C standard may be a little informal in places, but at least it is easily digestible compared to something like "The Revised Report on the Algorithmic Language Algol 68". – Ian Abbott Jun 13 '20 at 18:10
  • 1
    @ryyker: I use the semicolon in two ways (outside of mathematical/technical uses): To join two independent clauses that say largely the same thing in different ways or as a comma in a list where items already have commas of their own. This is the first use: An object is a “region of data storage in the execution environment, the contents of which can represent values.” So `int a[10]` defines an object, and that object is memory for ten `int`. The object and the memory for ten `int` are the same thing. – Eric Postpischil Jun 17 '20 at 13:49
  • 1
    @ryyker: [Some sources are less strict]( [Some sources are less strict about the relationship between the clauses.](https://www.grammarly.com/blog/semicolon/)) about the former use, allowing clauses that are merely “closely related.” I try to reserve it for when they overlap considerably. – Eric Postpischil Jun 17 '20 at 13:50
9

"But a itself is not pointing to another region of memory, it IS the region of memory itself.

"So when the compiler converts it to a pointer, does it save it (like p) somewhere in memory or it's an implicit conversion?"

It is an implicit conversion. The compiler does not implement the creation of a separate pointer object in memory (which you can f.e. assign in any manner with a different memory address) to hold the address of the first element.

The standard states (emphasize mine):

"Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type "array of type" is converted to an expression with type "pointer to type" that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined."

Source: ISO/IEC 9899:2018 (C18), 6.3.2.1/4

The array is converted to an expression of pointer type, it is not an lvalue.

The compiler just evaluates a to &a[0] (pointer to a[0]).


"I understand that array names are converted to pointers."

An array does not always convert to a pointer to its first element. Look at the first part of the quote above. F.e. when used as &a, a does not decay to a pointer to its first element. Rather it gains a pointer to the whole array int (*)[3].

Community
  • 1
  • 1
2

But a itself is not pointing to another region of memory, it IS the region of memory itself. So when the compiler converts it to a pointer, does it save it (like p) somewhere in memory or it's an implicit conversion?

Logically speaking, it's an implicit conversion - there's no requirement that the implementation materialize permanent storage for the pointer.

In terms of implementation, it's up to the compiler. For example, here's a simplistic bit of code that creates an array and prints its address:

#include <stdio.h>

int main( void )
{
  int arr[] = { 1, 2, 3 };
  printf( "%p", (void *) arr );
  return 0;
}

When I use gcc to compile it for x86-64 on a Red Hat system, I get the following machine code:

GAS LISTING /tmp/ccKF3mdz.s             page 1


   1                    .file   "arr.c"
   2                    .text
   3                    .section    .rodata
   4                .LC0:
   5 0000 257000        .string "%p"
   6                    .text
   7                    .globl  main
   9                main:
  10                .LFB0:
  11                    .cfi_startproc
  12 0000 55            pushq   %rbp
  13                    .cfi_def_cfa_offset 16
  14                    .cfi_offset 6, -16
  15 0001 4889E5        movq    %rsp, %rbp
  16                    .cfi_def_cfa_register 6
  17 0004 4883EC10      subq    $16, %rsp
  18 0008 C745F401      movl    $1, -12(%rbp)
  18      000000
  19 000f C745F802      movl    $2, -8(%rbp)
  19      000000
  20 0016 C745FC03      movl    $3, -4(%rbp)
  20      000000
  21 001d 488D45F4      leaq    -12(%rbp), %rax
  22 0021 4889C6        movq    %rax, %rsi
  23 0024 BF000000      movl    $.LC0, %edi
  23      00
  24 0029 B8000000      movl    $0, %eax
  24      00
  25 002e E8000000      call    printf
  25      00
  26 0033 B8000000      movl    $0, %eax
  26      00
  27 0038 C9            leave
  28                    .cfi_def_cfa 7, 8
  29 0039 C3            ret
  30                    .cfi_endproc
  31                .LFE0:
  33                    .ident  "GCC: (GNU) 7.3.1 20180712 (Red Hat 7.3.1-6)"
  34                    .section    .note.GNU-stack,"",@progbits

Line 17 allocates space for the array by subtracting 16 from the stack pointer (yes, there are only 3 elements in the array, which should only require 12 bytes - I'll let someone with more familiarity with the x86_64 architecture explain why, 'cause I'll get it wrong).

Lines 18, 19, and 20 initialize the contents of the array. Note that there's no arr variable in the machine code - it's all done in terms of an offset from the current frame pointer.

Line 21 is where the conversion occurs - we load the effective address of the first element of the array (which is the address stored in the %rbp register minus 12) into the %rax register. That value (along with the address of the format string) then gets passed to printf. Note that the results of this conversion aren't stored anywhere other than the register, so it will be lost the next time something writes to %rax - IOW, no permanent storage has been set aside for it the same way storage has been set aside for the array contents.

Again, that's how gcc in Red Hat running on x86-64 does it. A different compiler on a different architecture will do it differently.

John Bode
  • 119,563
  • 19
  • 122
  • 198
1

Here's what the 2011 ISO C Standard says (6.3.2.1p3):

Except when it is the operand of the sizeof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type “array of type” is converted to an expression with type “pointer to type” that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.

The standard uses the word "converted" here, but it's not the usual kind of conversion.

Normally, a conversion (either an implicit conversion, or an explicit conversion specified by a cast operator) takes an expression of some type as its operand, and yields a result of the target type. The result is determined by the value of the operand. In most or all cases, you could write a function that does the same thing. (Note that both implicit and explicit conversions perform the same operation; the fact that array-to-pointer conversion is implicit isn't particularly relevant.)

In the case of the array-to-pointer conversion described above, that's not the case. The value of an array object consists of the values of its elements -- and that value contains no information about the address at which the array is stored.

It probably would have been clearer to refer to this as an adjustment rather than a conversion. The standard uses the word "adjusted" to refer to the compile-time transformation of a parameter of array type to a parameter of pointer type. For example, this:

void func(int notReallyAnArray[42]);

really means this:

void func(int *notReallyAnArray);

The "conversion" of an array expression to a pointer expression is a similar kind of thing.

On the other hand, the word "conversion" doesn't only mean type conversions. For example, the standard uses the word "conversion" when discussing printf format strings ("%d" and "%s" are conversion specifications).

Once you understand that the "conversion" being described is really a compile-time adjustment, converting one kind of expression to another kind of expression (not value), it's much less confusing.

DIGRESSION:

One interesting thing about the standard's description of array-to-pointer conversion is that it talks about an expression of array type, but the behavior depends on the existence of "the array object". An expression of a non-array type doesn't necessarily have an object associated with it (i.e., it's not necessarily an lvalue). But every array expression is an lvalue. And in one case (the name of an array member of non-value union or structure expression, particularly when a function returns a structure value), the language had to be updated to guarantee that that's always the case, and the concept of temporary lifetime had to be introduced in the 2011 standard. The semantics of referring to the name of an array member of a structure returned by a function call were not at all clear in the 1990 and 1999 standards.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631