135

I have found a function that calculates square of a number:

int p(int n) {
    int a[n]; //works on C99 and above
    return (&a)[n] - a;
}

It returns value of n2. Question is, how does it do that? After a little testing, I found that between (&a)[k] and (&a)[k+1] is sizeof(a)/sizeof(int). Why is that?

haccks
  • 104,019
  • 25
  • 176
  • 264
Emanuel
  • 1,333
  • 1
  • 10
  • 11

5 Answers5

117

Obviously a hack... but a way of squaring a number without using the * operator (this was a coding contest requirement).

(&a)[n] 

is equivalent to a pointer to int at location

(a + sizeof(a[n])*n)

and thus the entire expression is

  (&a)[n] -a 

= (a + sizeof(a[n])*n -a) /sizeof(int)

= sizeof(a[n])*n / sizeof(int)
= sizeof(int) * n * n / sizeof(int)
= n * n
Mark Lakata
  • 19,989
  • 5
  • 106
  • 123
  • 11
    And as you clearly imply but I feel the need to make explicit, it's a syntax hack at best. The multiply operation will still be under there; it's just the operator that's being avoided. – Tommy Jan 07 '15 at 21:32
  • I got what it happens behind, but my real question is why (&a)[k] is at the same address as a + k * sizeof(a) / sizeof(int) – Emanuel Jan 07 '15 at 21:35
  • 33
    As an old codger I am taken aback by the fact that the compiler can treat `(&a)` as a pointer to an object of `n*sizeof(int)` when `n` is not known at compile time. C used to be a _simple_ language... – Floris Jan 08 '15 at 04:56
  • That's a pretty clever hack, but something that you wouldn't be seeing in production code (hopefully). – John Odom Jan 08 '15 at 16:43
  • 14
    As an aside, it's also UB, because it increments a pointer to point neither to an element of the underlying array, nor just past. – Deduplicator Jan 08 '15 at 17:13
  • @Floris: Indeed, and this (along a couple more) makes a very solid reason why many people still stick to C90 even today. – alecov Jan 08 '15 at 18:13
  • @haccks - This was taken from a codegolf challenge to write a function that squares a number without using the `*` symbol in the code. You can decide what that means. It's a silly challenge. – Mark Lakata Jan 09 '15 at 18:13
  • @MarkLakata; `*` operator has more than one meaning in C. You should specifically state about *multiplication* operator. At first glance I thought you are talking about indirection operator but understood when visited the link. – haccks Jan 09 '15 at 18:40
  • And more importantly, it doesn't eliminate the multiply, it just induces the compiler to hide it from you. A more honest solution to the challenge would be to write a function that simulates the simplest binary number multiply using shift and add. – ddyer Jan 13 '15 at 22:41
86

To understand this hack, first you need to understand the pointer difference, i.e, what happens when two pointers pointing to elements of same array are subtracted?

When one pointer is subtracted from another, the result is the distance (measured in array elements) between the pointers. So, if p points to a[i] and q points to a[j], then p - q is equal to i - j.

C11: 6.5.6 Additive operators (p9):

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements. [...].
In other words, if the expressions P and Q point to, respectively, the i-th and j-th elements of an array object, the expression (P)-(Q) has the value i−j provided the value fits in an object of type ptrdiff_t.

Now I am expecting that you are aware of array name conversion to pointer, a converts to pointer to first element of array a. &a is address of the entire memory block, i.e it is an address of array a. The figure below will help you to understand (read this answer for detailed explanation):

enter image description here

This will help you to understand that why a and &a has the same address and how (&a)[i] is the address of ith array (of same size as that of a).

So, the statement

return (&a)[n] - a; 

is equivalent to

return (&a)[n] - (&a)[0];  

and this difference will give the number of elements between the pointers (&a)[n] and (&a)[0], which are n arrays each of n int elements. Therefore, total array elements are n*n = n2.


NOTE:

C11: 6.5.6 Additive operators (p9):

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements. The size of the result is implementation-defined, and its type (a signed integer type) is ptrdiff_t defined in the <stddef.h> header. If the result is not representable in an object of that type, the behavior is undefined.

Since (&a)[n] neither points to elements of the same array object nor one past the last element of the array object, (&a)[n] - a will invoke undefined behavior.

Also note that, better to change the return type of function p to ptrdiff_t.

Community
  • 1
  • 1
haccks
  • 104,019
  • 25
  • 176
  • 264
  • "both shall point to elements of the same array object" - which raises the question for me whether this "hack" isn't UB after all. The pointer arithmetic expression is referring to the hypothetical end of a non-existant object: Is this even allowed? – Martin Ba Jan 08 '15 at 01:06
  • To summarise, a is the address of an array of n elements, so &a[0] is the address of the first element in this array, which is same to a; in addition, &a[k] will always be considered an address of an array of n elements, regardless of k, and since &a[1..n] is a vector as well, the "location" of its elements is consecutive, meaning first element is at position x, second is at position x + (number of elements of vector a which is n) and so on. Am I right? Also, this is a heap space, so does it mean that if I allocate a new vector of same n elements, its address is same as (&a)[1]? – Emanuel Jan 08 '15 at 01:44
  • 1
    @Emanuel; `&a[k]` is an address of `k`th element of array `a`. It is `(&a)[k]` that will always be considered an address of an array of `k` elements. So, first element is at position `a` (or `&a`), second is at position `a` + (number of elements of array `a` which is `n`)*(size of an array element) and so on. And note that, memory for variable length arrays are allocated on stack, not on heap. – haccks Jan 08 '15 at 02:08
  • @MartinBa; *Is this even allowed?* No. It is not allowed. Its UB. See the edit. – haccks Jan 08 '15 at 12:18
  • So would `(&a)[n]-(&a)[0]` be the correct way to do this? – Floris Jan 08 '15 at 18:20
  • @Floris; No. `(&a)[n]` is violating the rule that standard says: **both shall point to elements of the same array object, or one past the last element of the array object**. You can do this in a correct way by changing the function body to `return sizeof (char [n][n]);` as suggested in [this comment](http://stackoverflow.com/questions/27828822/cant-understand-this-way-to-calculate-the-square-of-a-number/27830842#comment44065474_27828822). – haccks Jan 08 '15 at 19:26
  • 1
    @haccks nice coincidence between the nature ot the question and your nickname – Dimitar Tsonev Jan 13 '15 at 21:52
  • @DimitarTsonev; Oh! Yes :) – haccks Jan 13 '15 at 21:55
  • I think I don't get why you say we are off specs. Aren't we in this case **one past the last element of the array object** (last element is (&a)[n-1], and we are asking (&a)[n], so one past) – Falanwe Jan 14 '15 at 07:37
  • @Falanwe; Last element is `a[n-1]`. `a[n]` is one past the last element of array. `(&a)[1]` is pointer to past the last element (an array of `n` `int`) of array. – haccks Jan 14 '15 at 11:14
35

a is a (variable) array of n int.

&a is a pointer to a (variable) array of n int.

(&a)[1] is a pointer of int one int past the last array element. This pointer is n int elements after &a[0].

(&a)[2] is a pointer of int one int past the last array element of two arrays. This pointer is 2 * n int elements after &a[0].

(&a)[n] is a pointer of int one int past the last array element of n arrays. This pointer is n * n int elements after &a[0]. Just subtract &a[0] or a and you have n.

Of course this is technically undefined behavior even if it works on your machine as (&a)[n] does not point inside the array or one past the last array element (as required by the C rules of pointer arithmetic).

ouah
  • 142,963
  • 15
  • 272
  • 331
  • Well, I got that, but why does this happen in C? What's the logic behind this? – Emanuel Jan 07 '15 at 21:31
  • @Emanuel there's no stricter answer to that really than that pointer arithmetic is useful for measuring distance (usually in an array), the `[n]` syntax declares an array and arrays decompose to pointers. Three separately useful things with this consequence. – Tommy Jan 07 '15 at 21:34
  • 1
    @Emanuel if you're asking *why* someone would do this, there is little reason and every reason *not* to due to the UB nature of the action. And it is worth noting that `(&a)[n]` is type `int[n]`, and *that* expresses as `int*` due to arrays expressing as the address of their first element, in case that wasn't clear in the description. – WhozCraig Jan 07 '15 at 21:35
  • No, I didn't mean why someone would do this, I meant why does the C standard behaves like this in this situation. – Emanuel Jan 07 '15 at 21:37
  • 1
    @Emanuel *Pointer Arithmetic* (and in this case a subchapter of that topic: *pointer differencing*). It is worth googling as well as reading questions and answers on this site. it has many useful benefits and is concretely defined in the standards when used properly. In order to fully grasp it you *have* to understand how the *types* in the code you listed are contrived. – WhozCraig Jan 07 '15 at 21:39
12

If you have two pointers that point to two elements of the same array then its difference will yield the number of elements between these pointers. For example this code snippet will output 2.

int a[10];

int *p1 = &a[1];
int *p2 = &a[3];

printf( "%d\n", p2 - p1 ); 

Now let consider expression

(&a)[n] - a;

In this expression a has type int * and points to its first element.

Expression &a has type int ( * )[n] and points to the first row of imaged two dimensional array. Its value matches the value of a though types are different.

( &a )[n]

is n-th element of this imaged two dimensional array and has type int[n] That is it is n-th row of the imaged array. In expression (&a)[n] - a it is converted to the address of its first element and has type `int *.

So between (&a)[n] and a there are n rows of n elements. So the difference will be equal to n * n.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • So behind every array there is a matrix of the size n*n? – Emanuel Jan 07 '15 at 21:43
  • @Emanuel Between these two pointers there is a matrix of n x n elements. And the difference of the pointers gives value equal to n * n that is how many elements are between the pointers. – Vlad from Moscow Jan 07 '15 at 21:49
  • But why is this matrix of the size n*n behind? Does it have any use in C? I mean, it's like C "allocated" more arrays of the size n, without me knowing it? If so, can I use them? Otherwise, why would this matrix be formed (I mean, it must have a purpose for it being there). – Emanuel Jan 07 '15 at 22:18
  • 2
    @Emanuel - This matrix is only an explanation of how pointer arithmetic works in this case. This matrix is not allocated and you cannot use it. Like it was already stated a few times, 1) this code snippet is a hack which has no practical use; 2) you need to learn how pointer arithmetic works in order to understand this hack. – void_ptr Jan 07 '15 at 23:03
  • @Emanuel This explains the pointer arithmetic. Expreesion ( &a )[n] is pointer to n- element of the imaged two dimensional array due to the pointer arithmetic. – Vlad from Moscow Jan 08 '15 at 06:03
4
Expression     | Value                | Explanation
a              | a                    | point to array of int elements
a[n]           | a + n*sizeof(int)    | refer to n-th element in array of int elements
-------------------------------------------------------------------------------------------------
&a             | a                    | point to array of (n int elements array)
(&a)[n]        | a + n*sizeof(int[n]) | refer to n-th element in array of (n int elements array)
-------------------------------------------------------------------------------------------------
sizeof(int[n]) | n * sizeof(int)      | int[n] is a type of n-int-element array

Thus,

  1. type of (&a)[n] is int[n] pointer
  2. type of a is int pointer

Now the expression (&a)[n]-a performs a pointer substraction:

  (&a)[n]-a
= ((a + n*sizeof(int[n])) - a) / sizeof(int)
= (n * sizeof(int[n])) / sizeof(int)
= (n * n * sizeof(int)) / sizeof(int)
= n * n
onlyice
  • 106
  • 1
  • 4