37
int arr[] = { 3, 5, 9, 2, 8, 10, 11 };      
int arrSize = *(&arr + 1) - arr;
std::cout << arrSize;

I am not able to get how this is working. So anyone can help me with this.

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
Rahul Goswami
  • 393
  • 3
  • 9

6 Answers6

32

If we "draw" the array together with the pointers, it will look something like this:

+--------+--------+-----+--------+-----+
| arr[0] | arr[1] | ... | arr[6] | ... |
+--------+--------+-----+--------+-----+
^        ^                       ^
|        |                       |
&arr[0]  &arr[1]                 |
|                                |
&arr                             &arr + 1

The type of the expressions &arr and &arr + 1 is int (*)[7]. If we dereference either of those pointers, we get a value of type int[7], and as with all arrays, it will decay to a pointer to its first element.

So what's happening is that we take the difference between a pointer to the first element of &arr + 1 (the dereference really makes this UB, but will still work with any sane compiler) and a pointer to the first element of &arr.

All pointer arithmetic is done in the base-unit of the pointed-to type, which in this case is int, so the result is the number of int elements between the two addresses being pointed at.


It might be useful to know that an array will naturally decay to a pointer to its first element, ie the expression arr will decay to &arr[0], which will have the type int *.

Also, for any pointer (or array) p and index i, the expression *(p + i) is exactly equal to p[i]. So *(&arr + 1) is really the same as (&arr)[1] (which makes the UB much more visible).

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • Pardon my naivety, but doesn't dereferencing have a higher precedence than subtraction? My understanding of your answer doesn't account for the dereferencing before the subtraction. – Thomas Matthews May 12 '21 at 19:17
  • For example, the expression `&arr + 1` is evaluated first into an address. Next, that address is dereferenced to an integer (i.e. memory is read). Then the address of the array is subtracted from the value read from memory. Am I wrong here? – Thomas Matthews May 12 '21 at 19:18
  • @ThomasMatthews The dereference in `*(&arr + 1)` will make the expression into the type `int[7]` which will then decay to `int *`, which is the same type a `arr` used in the subtraction (remember that `arr` is the same as `&arr[0]`). – Some programmer dude May 12 '21 at 19:19
  • 2
    @ThomasMatthews Yes, dereference will happen first. Without it, the subtraction wouldn't be possible because the pointers would have unrelated types (`int (*)[7]` and `int *`). Dereferencing `(&arr + 1)` produces a value of type `int [7]`, which then decays to `int *`, making the pointer types the same. The dereference here merely changes the pointer type, and not its numerical value. – HolyBlackCat May 12 '21 at 19:19
  • @Thomas `&arr + 1` is of type `int (*)[7]`. A dereference there will give you an `int[7]`. – scohe001 May 12 '21 at 19:19
  • Thanks for the clarification. – Thomas Matthews May 12 '21 at 19:21
  • @scohe001 Which at that moment gives you a non-existent object and therefore UB. Strictly speaking it doesn't matter that the array is then immediately converted to a pointer. – dbush May 12 '21 at 19:21
  • @Someprogrammerdude will dereferencing make it `int(&)[7]` or `int[7]`? https://gcc.godbolt.org/z/rss83MvnG – Aykhan Hagverdili May 12 '21 at 19:22
  • @dbush Whether or not this is actually UB seems to be undecided, there is an active language defect related to this. I've specifically asked that question : [Dereferencing one past the end pointer to array type](https://stackoverflow.com/questions/52727045/dereferencing-one-past-the-end-pointer-to-array-type). – François Andrieux May 12 '21 at 19:23
  • @AyxanHaqverdili The type of `&arr` (and `&arr + 1`) is `int (*)[7]`. The type of `*(&arr)` (and `*(&arr + 1)`) is `int[7]`. And `int[7]` decays to `int *`. – Some programmer dude May 12 '21 at 19:23
  • @Someprogrammerdude why does the static assert fail in the link I shared then? – Aykhan Hagverdili May 12 '21 at 19:24
  • 2
    @AyxanHaqverdili Remember that expressions can't have reference types. http://eel.is/c++draft/expr#type-1 And `decltype` adds fake `&` and `&&` to types of expressions to indicate value categories of those expressions (for lvalues and xvalues respectively; types of prvalues are unchanged). – HolyBlackCat May 12 '21 at 19:25
  • @HolyBlackCat I certainly didn't know that. If `a` in `int &&a = 1;` is an l-value (r-value references themselves are l-values), why is `decltype(a)` == `int&&` then? – Aykhan Hagverdili May 12 '21 at 19:30
  • 1
    @AyxanHaqverdili `decltype` has special rules for variables. For them it returns the type that's written next to them. If you do `decltype((a))`, it will return `int &` (because unlike `a`, `decltype` doesn't consider `(a)` a variable, so the regular rules apply). – HolyBlackCat May 12 '21 at 19:34
  • Very interesting. It's very complex. All I want is something like `decl_decay_t` which acts like `std::decay_t` so I can do `decl_decay_t(a)::value_type` and alike. Maybe I'll add a macro... – Aykhan Hagverdili May 12 '21 at 19:36
  • @ThomasMatthews: Precedence only controls the grouping of operators with operands - it does not control the order in which expressions are evaluated. It only means the expression is *parsed* as `(*(&arr + 1)) - (arr)` as opposed to `*((&arr + 1) - arr)` or something else. Yes, the result of `*(&arr + 1)` must be known before you can subtract the result of `arr` from it, otherwise the expression doesn't make sense. That's not due to the precedence of the `*` vs. `-` operators, though. – John Bode May 13 '21 at 18:24
19

That program has undefined behaviour. (&arr + 1) is a valid pointer that points "one beyond" arr, and has type int(*)[7], however it doesn't point to an int [7], so dereferencing it is invalid.

It so happens that your implementation assumes there is a second int [7] after the one you declare, and subtracts the location of the first element of that array that exists from the location of the first element of the fictitious array that the pointer arithmetic invented.

Caleth
  • 52,200
  • 2
  • 44
  • 75
  • How does the dereferencing help in calculating the size, since dereferencing has higher priority than subtraction and the dereferencing can be Undefined Behavior? – Thomas Matthews May 12 '21 at 19:12
  • 13
    I'm not sure this is definitely UB. I asked a related question ([Dereferencing one past the end pointer to array type](https://stackoverflow.com/questions/52727045/dereferencing-one-past-the-end-pointer-to-array-type)) and the answer to "is it legal" seems to be "it's an active core language issue". Edit : it still shouldn't be used, as it is also not definitely allowed yet. – François Andrieux May 12 '21 at 19:15
  • @ThomasMatthews that's why I talk about a fictitious `int[7]`. It's UB, the implementation can do anything, here it chooses to assume an extra array. Or it isn't UB, and the standard requires the assumption of another `int[7]` for the purpose of the subtraction – Caleth May 12 '21 at 19:32
  • 2
    @FrançoisAndrieux clang thinks it's UB, GCC and MSVC think it's not https://godbolt.org/z/r4e14e9MG – Aykhan Hagverdili May 12 '21 at 19:56
  • 3
    @AyxanHaqverdili: clang doesn't accept it as `constexpr`, but compiles it ok with `return ...`. https://godbolt.org/z/r96YhWEe1 - With `-fsanitize=undefined`, clang makes asm that checks if the stack pointer is within 28 bytes of the top of virtual address space, so it's checking for pointer overflow. (It of course runs fine). That may be a sign that it doesn't "understand" / "accept" what's going on; it just wrote that memory so the pointer's definitely valid if it didn't crash. So probably it doesn't realize that it's just doing 1-past-end of a valid object. – Peter Cordes May 13 '21 at 03:33
  • 3
    @PeterCordes Interesting analysis. The standard requires compilers to check for undefined behavior in constexpr. I used constexpr there to "get compilers' opinions" so to speak :) – Aykhan Hagverdili May 13 '21 at 08:21
  • @PeterCordes I *think* the expression `(&arr + 1)` is a valid 1-past end pointer (address of the hypothetical next array); the apparent dereferencing of that isn't actually reading any memory location - it is just (effectively) changing the same value to a different pointer type (address of the hypothetical first *element* of that hypothetical array). Aren't the address of an array and the address of its first element the same *by definition*? – Adrian Mole May 13 '21 at 09:02
  • 1
    No +1 as you lack an explanation *why* it's UB. I, too, think that it's UB however: https://stackoverflow.com/q/39401136/1116364 – Daniel Jour May 13 '21 at 11:40
10

You need to explore what the type of the &arr expression is, and how that affects the + 1 operation on it.

Pointer arithmetic works in 'raw units' of the pointed-to type; &arr is the address of your array, so it points to an object of type, "array of 7 int". Adding 1 to that pointer actually adds the size of the type to the address – so 7 * sizeof(int) is added to the address.

However, in the outer expression (subtraction of arr), the operands are pointers to int objects1 (not arrays), so the 'units' are just sizeof(int) – which is 7 times smaller than in the inner expression. Thus, the subtraction results in the size of the array.


1 This is because, in such expressions, an array variable (such as the second operand, arr) decays to a pointer to its first element; further, your first operand is also an array, as the * operator dereferences the modified value of the array pointer.


Note on Possible UB: Other answers (and comments thereto) have suggested that the dereferencing operation, *(&arr + 1), invokes undefined behaviour. However, looking through this Draft C++17 Standard, there is the vaguest of suggestions that it may not:

6.7.2 Compound Types
...
3    … For purposes of pointer arithmetic (8.5.6) and comparison (8.5.9, 8.5.10), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical element x[n].

But I won't claim "Language-Lawyer" status here, as there is no explicit mention in that section about dereferencing such a pointer.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
5

If you have a declaration like this

int arr[] = { 3, 5, 9, 2, 8, 10, 11 };

the the expression &arr + 1 will point to the memory after the last element of the array. The value of the expression is equal to the value of the expression arr + 7 where 7 is the number of elements in the array declared above. The only difference is that the expression &arr + 1 has the type int ( * )[7] while the expression arr + 7 has the type int *.

So due to the integer arithmetic the difference ( arr + 7 ) - arr will yield 7: the number of elements in the array.

On the other hand, dereferencing the expression &att + 1 having the type int ( * )[7] we will get lvalue of the type int[7] that in turn used in the expression *(&arr + 1) - arr is converted to a pointer of the type int * and has the same value as arr + 7 as it was pointed out above. So the expression will yield the number of elements in the array.

The only difference between these two expressions

( arr + 7 ) - arr

and

*( &arr + 1 ) - arr

is that in the first case we will need explicitly to specify the number of elements in the array to get the address of the memory after the last element of the array while in the second case the compiler itself will calculate the address of the memory after the last element of the array knowing the array declaration.

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
4

As others have mentioned, *(&arr + 1) triggers undefined behavior because &arr + 1 is a pointer to one-past-the end of an array of type int [7] and that pointer is subsequently dereferenced.

An alternate way of doing this would be to convert the relevant pointers to uintptr_t, subtracting, and dividing the element size.

int arrSize = reinterpret_cast<int>((reinterpret_cast<uintptr_t>(&arr + 1) -
                                     reinterpret_cast<uintptr_t>(arr)) / sizeof *arr);

Or using C-style casts:

int arrSize = (int)(((uintptr_t)(&arr + 1) - (uintptr_t)arr) / sizeof *arr);
dbush
  • 205,898
  • 23
  • 218
  • 273
  • Wouldn't `ptrdiff_t` be better here than `uintptr_t`? – Adrian Mole May 12 '21 at 19:34
  • @AdrianMole Only for the result type instead of `int`. I think it could overflow/underflow if you try to subtract it. – HolyBlackCat May 12 '21 at 19:43
  • 1
    https://timsong-cpp.github.io/cppwp/expr.reinterpret.cast#5 : *"mappings between pointers and integers are otherwise implementation-defined."* so this solution can work, it likely works on most implementation, but is not portable. You very well could have an implementation where, for example `(uintptr_t)&arr[0]` is bigger than `(uintptr_t)&arr[1]`. – François Andrieux May 12 '21 at 19:49
-1

This one is simple:

  1. arr is just a pointer to the 0'th element of the array (&arr[0]);
  2. &arr gives a pointer to the previous pointer;
  3. &arr+1 gives a pointer to a pointer to arr[0]+sizeof(arr)*1;
  4. *(&arr + 1) turns the previous value into just &arr[0]+sizeof(arr)*1;
  5. *(&arr + 1) - arr also subtracts the pointer to arr[0] leaving just sizeof(arr)*1.

So the only tricks here are that static arrays in C internally preserve all their static type information including their total sizes and that when you increment a pointer by some integer value, C compilers don't just add the value to it, but for whatever reason standards require to increase the pointers by the value of sizeof() of whatever type the pointer is assigned to times the specified value so *(&p+idx) gives the same result as p[idx].

C language is designed to allow for very simplistic compilers so inside it is full of little tricks like this. I would not recommend using them in production code though. Remember about other developers who may need to read and maintain your code later and use the most simple and obvious stuff available instead (for the example it is obviously just using sizeof() directly).

mrKirushko
  • 60
  • 1
  • 6
  • 1
    `arr` is an array, not a pointer. Arrays can *decay* to pointers to their first elements in many cases, but this is **not** one of those cases. Consequently, `&arr`, is not a pointer-to-pointer (not `int **`), but, as 3 other answers have stated, a pointer-to-array (`int (*)[7]`). – HolyBlackCat May 14 '21 at 18:58
  • Also step 4 is slightly confusing, since `&arr[0]+sizeof(arr)*1` would additionally multiply the rhs by `sizeof(int)`. – HolyBlackCat May 14 '21 at 19:01
  • In C for static arrays the only difference with pointers is the fact that it has some metadata associated with it which we can ignore for all the cases except for the hidden sizeof() which always gives the total size of the array in bytes, not just the number of its elements. So I don't see any confusion here. For dynamic arrays there is absolutely no difference with pointers at all except for the syntax. – mrKirushko May 15 '21 at 18:08
  • Let's ignore "dynamic arrays" for now, since it's just a name for regular pointers pointing to dynamically allocated memory. *"only difference with pointers is the fact that it has some metadata associated"* It's a way to think about arrays I guess, but this interpretation breaks under scrutiny. A (static) array is not a pointer (with or without metadata). When an array is converted to a pointer, the resulting pointer is computed on the fly, in general it's not stored in the memory otherwise. – HolyBlackCat May 15 '21 at 20:44
  • If you use your debugger to inspect the memory associated with an array, you'll find no pointer there. Mext, if you make a pointer-to-pointer (`int **`) and try to point it to an array, the compiler won't let you - again, because arrays don't *store* pointers to their elements. *"has some metadata associated"* There's no hidden metadata involved. They just have different types compared to pointers. They are implicitly converted to pointers in many cases, but not always. That doesn't happen are applying `sizeof`, yes, but also when applying and `&` to it, or binding it to a reference. – HolyBlackCat May 15 '21 at 20:50
  • The fact that the static array related pointers are not stored in memory and just get evaluated at compile time does not mean that they are not designed as pointers. And although `int** p = arr` does give you a warning (with gcc at least), `int* p = arr` compiles just fine without any warnings and works as expected. – mrKirushko May 16 '21 at 02:18
  • It is mostly just a question of how you want to look at it but at least for me the fact that you can do `*arr` and you get the same value as `arr[0]` and `*arr+3` is the same as `arr[3]` and `(int)&arr[0]` always gives exactly the same result as `(int)arr` is a firm enough proof for the fact that in C the whole [] syntax is nothing more than some early syntax sugar and the whole array support idea was probably added as an afterthought and expecting a compiler to provide the functionality using not more than just a couple of tiny little dirty hacks to the main part of the implementation. – mrKirushko May 16 '21 at 02:18
  • Just to be precise I mean `*(arr+3)` of course, not `*arr+3`. It is sad that Stack Overflow is so strict about 5 minutes to edit a comment even before any answer has arrived so I have to add an extra comment to correct the previous mistake. – mrKirushko May 16 '21 at 02:38