How does *(&arr + 1) - arr give the length in elements of array arr?

Question

#include <iostream>
using namespace std;

int main() { 
   int  arr[5] = {5, 8, 1, 3, 6};
   int len = *(&arr + 1) - arr;
   cout << "The length of the array is: " << len;
   return 0;
}

For the code above, I don't quite understand what these two pieces of codes are doing:

*(&arr + 1)

and

*(&arr)
&arr

Could someone explains? Because when I run the following two codes, I get the same output for the following:

&arr (I think this point to the address of the first element of arr)

*(&arr) then I don't quite understand what this do, what does the symbol * do to &arr (i.e. to the address here)?, because the two outputs are the same when I run them

and finally what is it exactly happening when an integer say 1 is added to the address by this code here: &arr + 1

`*(&arr + 1) ` invokes *undefined behavior*. This code does not calculate the length of an array, it's just broken — UnholySheep, Apr 15 '20 at 20:14
*Undefined behavior* means that anything can happen. Including accidentally the result you expected. Though that might just be on your computer and compiler — UnholySheep, Apr 15 '20 at 20:21
There is certainly more to that than just undefined code. There should be a technical reason for this, which I am currently trying to understand. — SergeyA, Apr 15 '20 at 20:28
Please forget this trick as quickly as possible. Use the canonical `sizeof(arr)/sizeof(*arr)` instead. — cmaster - reinstate monica, Jun 23 '20 at 20:48
@cmaster-reinstatemonica Or even better: `std::size(arr)`. OP wants to understand how it works so I don't think this is something he actually uses. — Ted Lyngmo, Jun 23 '20 at 20:52
@TedLyngmo Yes, that's even better in C++. The `sizeof` approach works both in C and C++, though. — cmaster - reinstate monica, Jun 24 '20 at 07:49

Ted Lyngmo · Answer 1 · 2020-06-23T20:29:58.247

5

This is a mine field, but I'll give it a try:

&arr returns a pointer to an int[5]
+ 1 steps the pointer one int[5]
*(&arr + 1) dereferences the result back to an int(&)[5]
I don't know if this causes undefined behavior, but if it doesn't, the next step will be:
*(&arr + 1) - arr does pointer arithmetics after the two int[5]'s have decayed to int pointers, returning the diff between the two int pointers, which is 5.

Rewritten to make it a bit clearer:

int  arr[5] = {5, 8, 1, 3, 6};

int (*begin_ptr)[5] = &arr + 0;     // begin_ptr is a  int(*)[5]
int (*end_ptr)[5]   = &arr + 1;     // end_ptr is a    int(*)[5]

// Note:
//       begin_ptr + 1        ==  end_ptr
//       end_ptr - begin_ptr  ==  1

int (&begin_ref)[5] = *begin_ptr;   // begin_ref is a  int(&)[5]
int (&end_ref)[5]   = *end_ptr;     // end_ref is a    int(&)[5]   UB here?

auto len = end_ref - begin_ref; // the array references decay into int*
std::cout << "The length of the array is: " << len << '\n'; // 5

I'll leave the question if it's UB or not open but referencing an object before the referenced storage has been allocated does look a bit suspicious.

edited Jun 23 '20 at 20:29

answered Apr 15 '20 at 20:29

Ted Lyngmo

93,841
5
60
108

3

constructing a pointer to one past the last element is fine so `&arr + 1` is ok. Dereferencing it is UB so `*(&arr + 1)` is not ok. – bolov Apr 15 '20 at 20:39
@bolov Yeah, I was staring at what I wrote and tried to reason with myself about it. It's still not accessing memory, it's only the type that changes. Does that matter? – Ted Lyngmo Apr 15 '20 at 20:46
Well, it is undefined behavior, you are dereferencing a pointer to array by using indirection operator. – SergeyA Apr 15 '20 at 20:57
@SergeyA Do you mean that `*(&arr + 0)` would be UB too or is it that it's dereferenced to an `int[5]` _one past the end_ that makes it UB? – Ted Lyngmo Apr 15 '20 at 21:06
2

Yes, you are dereferencing one past the end, which is not allowed. I am trying really hard now to come up with the code exploiting the same technique, but can't find any. I am on the verge of asking a language lawyer question. – SergeyA Apr 15 '20 at 21:07
2

This code is not actually *dereferencing* past the end of `arr` itself, it is *creating* a pointer to the end of `arr`, but it is not dereferencing that pointer to access the memory of `arr` itself. Now granted, the code is using a temporary pointer, treating the whole `arr` as a single-element array for purposes of the `+1` arithmetic, and then dereferencing that pointer, but because the backing memory is another array, and this code is just manipulating pointers, not actually accessing any memory, I'm not sure that the behavior is actually undefined... – Remy Lebeau Apr 15 '20 at 21:18
2

Asked The Question: https://stackoverflow.com/questions/61238781/forcing-a-decay-of-the-array-for-the-lack-of-better-title – SergeyA Apr 15 '20 at 21:19
@RemyLebeau Yes, that's where my thoughts went with this. Let's home SergeyA's `language-lawyer` question sorts it out. – Ted Lyngmo Apr 15 '20 at 21:22
...Just as it is perfectly legal to increment a pointer well beyond the bounds of an array, as long as it is decremented back into bounds before being dereferenced. – Remy Lebeau Apr 15 '20 at 21:23
I will explain why I think it is undefined in my question. – SergeyA Apr 15 '20 at 21:25
@RemyLebeau hm... It is way more complicated than I thought. Apparently there was language considered to explicitly allow the indirection through such pointers (including null pointer) but it was never adopted. See http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232. Let's see what my answer will fetch. – SergeyA Apr 15 '20 at 21:40
Hello Ted, for your first highlight (i.e. &arr) when you said it "return pointer to an int[5]", could you explain what is this "int[5]"? is this an empty array of integer of length 5? – john_w Apr 15 '20 at 22:15
And for your second highlight (i.e. +1), could you explains what you mean when you said "steps the pointer one int[5]", what does it mean by "steps the pointer one"? sorry could you elaborate a little bit more. Thank you. – john_w Apr 15 '20 at 22:17
@john_w I'll try to illustrate it better where I've given the variables clear types. [example @ godbolt](https://godbolt.org/z/zSv54h) (updated) – Ted Lyngmo Apr 15 '20 at 22:22
@john_w see the answer I just posted – Remy Lebeau Apr 15 '20 at 22:52
@john_w I put the relevant parts from my earlier godbolt example into the answer to try to clarify it a bit too. – Ted Lyngmo Apr 16 '20 at 09:33

Remy Lebeau · Answer 2 · 2020-04-15T23:20:58.217

Given the following facts:

When you increment/decrement a pointer by an integral value X, the value of the pointer is increased/decreased by X times the number of bytes of the type the pointer is pointing at.
When you subtract 2 pointers of the same type, the result is the difference between their held addresses, divided by the number of bytes of the type being pointed at.
When you refer to an array by its name alone, it decays into a pointer to the array's 1st element.

The type of your arr variable is int[5], ie an array of 5 ints. &arr returns an int[5]* pointer to arr (technically, it is actually written like int(*)[5], but lets not worry about that here, for simplicity). Lets call this pointer temp below.

Then, the + 1 increments the value of temp by 1 int[5] element. In other words, the address stored in temp is increased by 1 * sizeof(int[5]), or 1 * (sizeof(int) * 5), number of bytes. This effectively gives you an int[5]* pointer to the end of arr (ie, to &arr[5]). No int[5] element physically exists at that memory address, but it is legal to create a pointer to it, for purposes of pointer arithmetic.

Dereferencing temp gives you a reference to an int[5] at the end of arr. That reference decays into an int* pointer when passed to operator-.

In - arr, the reference to arr decays into an int* pointer to arr[0] when passed to operator-.

Thus, given this code:

int len = *(&arr + 1) - arr;

Which is effectively the same as this:

int len = &arr[5] - &arr[0];

Which is effectively the same as this:

int len = (<address of arr[5]> - <address of arr[0]>) / sizeof(int);

Thus, the result is 5.

score 1 · Answer 3 · answered Apr 15 '20 at 21:38

1

Example:

int  arr[] = {1, 2, 3, 4, 5, 6}; 
int size = *(&arr + 1) - arr;

Here the pointer arithmetic does its part. We don’t need to explicitly convert each of the locations to character pointers.

&arr ==> Pointer to an array of 6 elements. [See this for difference between &arr and arr]

(&arr + 1) ==> Address of 6 integers ahead as pointer type is pointer to array of 6 integers.

*(&arr + 1) ==> Same address as (&arr + 1), but type of pointer is "int *".

*(&arr + 1) - arr ==> Since *(&arr + 1) points to the address 6 integers ahead of arr, the difference between two is 6.

answered Apr 15 '20 at 21:38

Shivam Jha

3,160
3
22
36

hello, for your first highlight, could you provide the link for "this"? And for your second highlight, could you explain what you means by "Address of 6 integers ahead"? what does it mean 6 integers ahead (ahead of what sorry, what is the starting place to count from)? – john_w Apr 15 '20 at 22:13
@john_w Ahead of the beginning address of `arr` – Remy Lebeau Apr 15 '20 at 23:05
@john_w adding 1 to a pointer actually adds the number of total bits that it points to.. so `(&arr + 1)` will add 6 bits (assuming `int` to be of size 1). – Shivam Jha Apr 16 '20 at 10:41

William Leung · Answer 4 · 2021-10-18T09:19:34.680

Maybe I am too late to join the discussion, but I think this is a good question and it deserve a more thorough answer.

I originally see the op snippet at here

There are total 4 operations here.

What &arr actually does is to dynamically creating a 2d array, with its first dimension equal to 1, and get the pointer point to the head of this 2d array. If you are not familiar with the 2d array, Shahbaz have introduce it very well at Why can't we use double pointer to represent two dimensional arrays?
In particular, it is the structure of array2 in the post, and the pointer point to this newly created 2d array has type int (*)[5]

The +1 in &arr+1 do the pointer arithmetic on it first dimension. Recalled that the first dimension is just 1. This is exactly why the (&arr + 1) points to the memory address right after the end of the original array.

The * in *(&arr + 1) convert the 2d array pointer (which has type int (*)[5]) back to one dimensional array pointer (which has type int*).

Finally the - arr in *(&arr + 1) - arr is a pointer subtraction. According to the standard (N1570):

6.5.6 Additive operators
....
⁹ When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements.

One final question arised, In the discussion of How do I determine the size of my array in C? We know that the sizeof method only work for arrays on stack, but how about this method? Unfortunately, this method also work on stack only. If you receive the pointer of array inside a function, the array size information is loss and you have no way to dynamically create a 2d array for it. The pointer arithmetic follows will simply fall apart.

score 0 · Answer 5 · answered Dec 23 '21 at 13:05

&arr ==> Pointer to an array of n elements. (&arr + 1) ==> Address of 6 integers ahead as pointer type is pointer to array of n integers.

*(&arr + 1) ==> Same address as (&arr + 1), but type of pointer is "int *".

*(&arr + 1) - arr ==> Since *(&arr + 1) points to the address n integers ahead of arr, the difference between two is n.

(&arr + 1) points to the memory address right after the end of the array.
*(&arr + 1) simply casts the above address to an int *.
Subtracting the address of the start of the array, from the address of the end of the array, gives the length of the array.

How does *(&arr + 1) - arr give the length in elements of array arr?

5 Answers5

Linked

Related