4

It is explained elsewhere on stackoverflow (e.g. here, where unfortunately currently the accepted answer is incorrect---but at least the highest upvoted answer is correct) that the C standard provides that in almost all circumstances an array char my_array[50] will be implicitly converted to a char * when it is used, e.g. by passing to a function as do_something(my_array), given a declaration of void do_something(char *stuff) {}. That is, the code

void do_something(char *my_array) {
    // Do something
}

void do_something_2(char my_array[50]) {
    // Do something
}

int main() {
    char my_array[50];
    
    do_something(my_array);
    do_something_2(my_array);
    
    return 0;
}

is compiled by gcc without any warnings on any strictness level.

However, paragraph 6.3.2.1.3 of C11 provides that this conversion does not occur specifically if one writes &my_array, or sizeof(my_array) (and moreover that these are the only times when this conversion does not occur). The purpose of the latter rule is obvious to me---the sizeof an array being equal to the size of a pointer to the first element is very confusing, so should be prevented.

But the purpose of the first part of this rule (to do with writing &my_array) entirely escapes me. See, the rule makes the type of &my_array (in the notation of the C standard) char (*)[50], instead of char *. When does this behaviour have any use at all? Indeed, except for sizeof-purposes, why does the type char (*)[50] exist at all?

For example, it is also explained on stackexchange (e.g. here) that any declared array argument to a function, such as char my_array[50] in the definition of do_something_2 above, behaves in all ways exactly the same as if char *my_array was written in the declaration instead, or even char my_array[0] or char my_array[5]! Even worse, it means that writing do_something(my_array) compiles without any type errors in any of these circumstances, while do_something(&my_array) (i.e. passing an array type of the correct size to a function declared to accept precisely that array type) is an error!

In summary, does the "&-part" of C11 6.3.2.1.3 have any purpose at all? If so, what is it?

(The only reason I could think of is in order to make sizeof(&my_array) evaluate to the same thing as sizeof(my_array), but this does not even happen due to other C standard rules!---the former sizeof(&my_array) construction "as expected" indeed reports the size of a pointer, and not the array itself. See here.)

Keeley Hoek
  • 543
  • 4
  • 18
  • 2
    The types are different. – wildplasser Jul 28 '21 at 18:46
  • @wildplasser sorry, I don't understand, my question is about the rationale between why an implicit conversion between *different types* is defined to not occur under specific circumstances. – Keeley Hoek Jul 28 '21 at 18:53
  • IMO it iis far to early for you to change the C standard :) – 0___________ Jul 28 '21 at 19:01
  • 1
    Without this exception `&array` would be an error, since the pointer resulting from the implicit conversion is an rvalue. Your proposal of making `&array` equivalent to `array` would be yet another exception, and unlike `&array` it doesn't give you any new possibilities. – HolyBlackCat Jul 28 '21 at 19:29
  • The result of the `&` operator has to be the address of an object. But there is no pointer object associated with an array. What did you have in mind ` &x` should implicitly malloc a pointer somewhere that points to `x` and the give the address of that pointer? – M.M Jul 28 '21 at 22:20
  • According to [§6.5.3.4 The `sizeof` and `_Alignof` operators](http://port70.net/~nsz/c/c11/n1570.html#6.5.3.4p3): _The `_Alignof` operator yields the alignment requirement of its operand type. The operand is not evaluated and the result is an integer constant. When applied to an array type, the result is the alignment requirement of the element type._. Thus, the array does not degenerate to a pointer with `_Alignof`. See also [§6.3.2.1 Lvalues, arrays and function designators](http://port70.net/~nsz/c/c11/n1570.html#6.3.2.1p3). – Jonathan Leffler Jul 28 '21 at 23:12

4 Answers4

5

Indeed, except for sizeof-purposes, why does the type char (*)[50] exist at all?

Given char x[100][50], the automatic conversion of x to a pointer produces a pointer to its first element. Its first element is a char [50], so a pointer to that is char (*)[50]. So this is the type that the conversion must produce.

When we pass some two-dimensional array, say int x[100][50], to an array with a parameter declared int x[100][50], that parameter will be automatically adjusted to int (*x)[50]. Then the function will access elements using a notation such as x[i][j]. If x had been adjusted to some other type, this would not work—we need x to be a pointer to char [50] so that x[i] correctly calculates in elements of 50-element subarrays and so that it produces such a subarray as its result, which can then be used with [j].

Sometimes we might want the function to operate only on some portion in the middle of x. To do that, we would pass it the starting address of that portion. For example, we might pass it &x[n], to start at the nth row of the array. As before, the adjusted function parameter is char (*)[50], so we need &x[n] to give us the address of the subarray that is x[n] with type char (*)[50]. Passing a char * would not be the correct type for the parameter.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
3

The & operator isn't the exception - it's the "decay" rule of array expressions that is the exception. No other aggregate type (struct or union) "decays" to a pointer1. It's the array type that's weird, not the operator.

For every lvalue expression x of type T, &x yields a value of type T * (pointer to T). Period, no exceptions. If x has type int, then &x has type int *. If x has type double, then &x has type double *. If x has type int [10], then &x has type int (*)[10]. The semantics are exactly the same in all cases.

The decay rule exists because dmr wanted to keep the array semantics from B (a precursor to C), but he didn't want to store the explicit pointer those semantics required2. So instead of storing the pointer, he came up with the "decay" rule - when the compiler sees an array expression that isn't the operand of the sizeof or unary & operators, it converts that expression from type "N-element array of T" to "pointer to T" and the value of the expression is the address of the first element.

This allowed C to keep B's array indexing semantics where a[i] is defined as *(a + i) - given a starting address a, offset i elements (not bytes! - this will be important later) from that address and dereference the result. The tradeoff is that array expressions in C lose their array-ness most of the time.

why does the type char (*)[50] exist at all?

First of all, let's see how that decay rule applies to a 2D array. Imagine an array declaration

A a[N][M];

Remember the rule "the expression a is converted from N-element array of T to pointer to T" - in this case, T is "M-element array of A", so the expression a decays from "N-element array of M-element array of A" to "pointer to M-element array of A", or A (*)[M]. So pointer to array types fall naturally out of the decay rule anyway.

Secondly, remember how pointer arithmetic works - if p stores the address of an object of type T, then p + 1 yields the address of the next object, not necessarily the next byte. Again, the array indexing operation a[i] is defined as *(a + i) - a is the address of the first element of the array, a + 1 is the address of the second element, a + 2 is the address of the third element, etc.

So if a yields the address of an M-element array of A, then a + 1 yields the address of the next M-element array of A.

This is exactly how multi-dimensional array indexing works. If we have

char arr[2][2];

then we have

          char          char *        char (*)[2]
    +---+       
arr:|   | arr[0][0]     arr[0]        arr
    + - +
    |   | arr[0][1]     arr[0] + 1
    +---+
    |   | arr[1][0]     arr[1]        arr + 1
    + - + 
    |   | arr[1][1]     arr[1] + 1   
    +---+

The expression arr[i][j] is equal to *(arr[i] + j), which is equal to *(*(arr + i) + j). arr + i yields the address of the ith 2-element array of char, *(arr + i) + j yields the address of the j'th element of that 2-element array.

We can also use pointer to array types for dynamic allocation. Remember that the common idiom is

T *p = malloc( N * sizeof *p );

This allocates space for N elements of T and assigns the address of that space to p. If I change T to an array type A [M], I get

A (*p)[M] = malloc( N * sizeof *p );

The semantics are exactly the same, all that's changed is the type - I'm allocating space for N elements of type A [M] - IOW, an array of type A [N][M].


  1. Function expressions have a decay rule as well, but we'll ignore that for now.
  2. See the section titled "Embryonic C" of the paper The Development of the C Language.
John Bode
  • 119,563
  • 19
  • 122
  • 198
2

If arrays would decay for operator &, it would break the whole type system.

For example, pointer math is based on the size of the object. If arrays would decay for & operator, we would not be able to apply proper pointer mathematics to arrays addresses. That would mean that arrays of arrays would not be possible to use (unless type system would be broken even more to add special additional exceptions for this case)

Generally speaking, the decaying business is kinda unfortunate and actually a rudiment of C's predecessor, B. It is good that at least in this case decaying does not happen.

SergeyA
  • 61,605
  • 5
  • 78
  • 137
1

You almost never want a pointer to an entire array. But if you do want such a pointer, an explicit application of the & operator will get you one -- although, yes, the pointer you get has a significantly different type than the one you would have implicitly gotten, without the &.

Given that you might want to construct a pointer to an entire array using &, it's important that & have that array to apply itself to. Given the expression:

&a

if a were to immediately decay into a pointer to its first element, then & would try to generate a pointer to that pointer -- which would not be what you wanted. And since the implicitly-generated pointer would be an rvalue, it wouldn't be possible to apply & to it at all.

Or, in other words, if & weren't an exception to the pointer-decay rule, then &a would be equivalent to the nonsensical &(&a[0]).


If your question is, "Why would you want a pointer to an entire array?", sorry, I don't have any realistic examples just now.

...And it's possible there are no realistic examples. I had forgotten, but according to the C FAQ list,

In pre-ANSI C, the & in &arr generally elicited a warning, and was generally ignored.

This suggests that, in Ritchie's original formulation, &arr was meaningless and not to be used, and therefore it wasn't really an exception to the pointer-decay rule. The explicit exception arose only when X3J11 defined a meaning for the formerly-meaningless "&arr".

Steve Summit
  • 45,437
  • 7
  • 70
  • 103