3

i am new to c++ and stumbled upon this way of computing the length of an array with pointers which i don't exactly understand. I looked everywhere but nowhere seems to be an explanation on how it works, i just heard that it's supposed to be a bad way of computing array length but why is that and how does it even work?

The code would look something like this:

int array[4] = [0, 1, 2, 3]
//...
int length = *(&array + 1) - array

As far as i've tried, it really seems to work, but i don't exactly understand why. I know a bit of pointer logic but this statement seems really odd to me, because you're essentially taking the address of the array (the first element i suppose) and adding one to it (i can imagine that that will give you the address after the last element, but then i don't understand why you would dereference it). And what confuses me most, is that this all gets substracted with the array itself?! Without an index or anything.

It would really help when someone would be able to explain that to me, and why it's supposed to be bad exactly.

Thanks.

Aerotrix
  • 33
  • 3

2 Answers2

8
&array

This is a pointer to the object array. It is a singular object of an array type.

&array + 1

Adding a number to a pointer produces a pointer to a successive sibling of the object in an array of objects. Adding 1 produces the next sibling. For purposes of this pointer arithmetic, singular objects are treated as array of single object. Hence, adding 1 is allowed and it produces a pointer past the end of the figurative array.

*(&array + 1)

Strictly speaking, this indirects through a pointer past the end, and it may be argued that the behaviour of the program is undefined.

But let's assume that's not a problem. The indirection operation produces an lvalue to the (non-existent) object at the address after the array.

*(&array + 1) - array

Here, the operands of the subtraction are lvalues to arrays. One is the actual array and the other is a hypothetical sibling element in a hypothetical array of arrays. In this case, those arrays implicitly convert to a pointer to first element of the respective arrays.

Technically, the subtraction between the converted pointers is undefined because they are pointers to elements of separate arrays, so arguably the behaviour of the program is undefined for yet another reason.

But let's assume that's not a problem. The result of subtracting pointers to two elements of the same array produces the distance of the elements. The distance between first elements of adjacent arrays is exactly the number of elements in the first array.

why it's supposed to be bad exactly.

Notice the parts in previous sections that say behaviour of the program is undefined. That's bad.

Also, you had a problem understanding what it does. That's bad.

Recommended way to get the size of an array is to use std::size(array).

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • 1
    @Aconcagua `Problem rather is that we are subtracting pointers of different type... ` There's no such problem. Both `array` and `*(&array + 1)` are lvalues of same type: `int[4]`. They both decay to `int*`. After the decay, the subtracted pointers are of same type. But they point to first element of different arrays. – eerorika May 11 '22 at 10:36
  • @Aconcagua The standard defines pointer subtraction for pointers to elements of an array. It does not define what pointer subtraction means when operands are not to elements of the same array. It's undefined. `arr2d[0][0]` and `arr2d[1][0]` are not elements of the same array. – eerorika May 11 '22 at 10:45
  • @eerorika In C, what you say is correct. In C++, the calculation of a pointer one past the end of an array (of any type) is explicitly allowed and valid, but dereferencing such a pointer gives undefined behaviour. – Peter May 11 '22 at 10:50
  • @Peter One-past-the-end pointer is valid in C as well – and not dereferencable alike. Interesting question now is if a pointer to the first element of a succeeding sub-array is identical to the one-past-the-end pointer of the preceding one. Because if so – not only de-facto, but backed by the standard – this length calculation would get legal for all but the very last sub-array in a 2D array... – Aconcagua May 11 '22 at 11:30
2

The logic of the code is to first pretend there is a second array of four int (called array2for sake of discussion) located in memory immediately after the end of array. I say "pretend" because array2 does not actually exist.

Based on that pretense, the logic of the code is then;

  1. &array is a pointer to array. It has type int (*)[4] (more verbosely described for humans as "a pointer to an array of four int");
  2. &array + 1 is a pointer to array2;
  3. Dereferencing that pointer i.e. calculating *(&array + 1) gives a (reference to) array2;
  4. In the expression *(&array + 1) - array, the terms *(&array + 1) and array are each implicitly converted to an int *. The values of these pointers are &array2[0] and &array[0] respectively. So the expression *(&array + 1) - array is equivalent to &array2[0] - &array[0];
  5. Since array2 is located in memory immediately after the last element of array, &array2[0] is equal to &array[4] (i.e. to the address of a non-existent array[4]). Subtracting two pointers of type int * gives the number of ints between them i.e. &array[4] - &array[0] gives a value 4 (of type std::ptrdiff_t);
  6. Since length has type int, that std::ptrdiff_t with value 4 is converted to an int, i.e. to the value 4`.

That's the logic that (presumably) the compiler (or compilers) you are testing with is using.

The problem - i.e. why people consider it bad - is because array2 and array[4] do not actually exist. So - according to the standards - step 3 above gives undefined behaviour. All of the subsequent points (which mention the non-existent array2 or the non-existent array[4]) also involve undefined behaviour. The meaning of undefined behaviour is that the standards do not define what happens - so compilers are NOT required to implement the logic of the code.

A way to get the size of array without giving undefined behaviour is simply length = sizeof(array)/sizeof(array[0]) since the sizeof operator examines only the type of its arguments (and doesn't evaluate them) so avoids undefined behaviour. sizeof(array) gives the size (in bytes) of an array of four int, and sizeof(array[0]) gives the size (in bytes) of a single int, so dividing them gives 4.

Advanced note: The limitations of the approach using sizeof is that it doesn't work in the presence of pointer conversion. For example, if array is actually a pointer (which happens if a raw array is passed as an argument to a function) the calculation will not (necessarily) give a value of 4.

Peter
  • 35,646
  • 4
  • 32
  • 74
  • The same limitation applies for the pointer arithmetic approach: `int* p = ...; size_t s = *(&p + 1) - p;`, doesn't it? – Aconcagua May 11 '22 at 11:08
  • I can't remember the standard requiring two separate variables being declared one after another not having gaps in between – which is the case for array members, though. Maybe better consider `array` as part of a 2d array instead? – Aconcagua May 11 '22 at 11:11
  • @Aconcagua For your first comment: If you mean that calculating `*(&p + 1)` gives undefined behaviour, then yes. For your second: the point is that the expression is dereferencing a pointer to something that doesn't exist - it's the same logic whether we assume a 2D array or pretend there is a second array immediately in memory after the one we have defined (since a 2D array is simply an array of arrays, and arrays are contiguous). Either way, the standard does not require anything there. – Peter May 11 '22 at 11:12
  • I meant that it won't result in correct size either – but the UB actually is even worse... – Aconcagua May 11 '22 at 11:14