0
#include <iostream>

void inc(long& in){
 in++;
}

int main(){
 int a = 5;
 inc(*reinterpret_cast<long*>(&a));

 printf("%d\n", a);
 return 0;
}

Above code compiles successfully and prints 6. Is it undefined behaviour? as I'm not really making a reference to anything on the stack. I'm doing an inline cast.

Note that this will not work with a normal cast such as static_cast<long>(a), you will get a compiler error saying:

candidate function not viable: expects an l-value for 1st argument

Why does *reinterpret_cast<long*>(&a) compile? I'm dereferencing it to a long so the compiler should be able to tell it's not referencing anything.

Edit

Ignore the int/long difference here please, that's not the point of my question. Assume this example if it makes more sense:

void inc(int& in){...}
...
inc(*reinterpret_cast<int*>(&a));

My question is, how is it ok to pass *reinterpret_cast<int*>(&a) as a reference but static_cast<int>(a) isn't?

Dan
  • 2,694
  • 1
  • 6
  • 19
  • 1
    `reinterpret_cast` is just for intrinsically dangerous operations, so it assumes you know what you're doing and compiles most of the times. – iBug May 07 '21 at 02:50
  • 3
    "*Is it undefined behaviour?*" - yes, because an `int` is not a `long`, but you are asking the compiler to treat an `int` variable *as-if* it were a `long` variable, which MAY OR MAY NOT work, depending on the compiler implementation (ie, if `sizeof(long) > sizeof(int)`, then `in++` can corrupt the stack memory in `main()`). – Remy Lebeau May 07 '21 at 02:56
  • @RemyLebeau no that's not the main problem here, the main issue is passing an rvalue by reference, an rvalue doesn't have a reference. – Dan May 07 '21 at 02:58
  • Refer https://stackoverflow.com/questions/332030/when-should-static-cast-dynamic-cast-const-cast-and-reinterpret-cast-be-used Has a good explanation on when to use different casts – Anish May 07 '21 at 02:58
  • 1
    @Dan there is no rvalue in this code, the result of dereferencing the pointer produced by `reinterpret_cast(&a)` is an lvalue that the `long&` reference can bind to. `inc(reinterpret_cast(a));` would also work, too – Remy Lebeau May 07 '21 at 03:00
  • How is the dereferenced value an `lvalue`, where is it stored? the compiler must take the reference from somewhere. – Dan May 07 '21 at 03:06
  • 1
    Handy reading: [Is a dereferenced pointer a valid lvalue?](https://stackoverflow.com/questions/4773839/is-a-dereferenced-pointer-a-valid-lvalue) – user4581301 May 07 '21 at 03:13
  • @Dan, Perhaps it would make sense looking at what one compiler [actually does](https://gcc.godbolt.org/z/4h8r1xPqn) with a simplified version of the code, without optimizations because they're easily smart enough to get rid of everything otherwise. You can see this compiler make space on the stack for `a`, store 5 there (using `dword ptr`, i.e., 4 bytes), load the address of `a` into rdi, and then call `inc`. `inc` then uses the address in rdi to access the value as `qword ptr`, i.e., 8 bytes. The compiler is free to do whatever it wants with UB, but this is a pretty literal output. – chris May 07 '21 at 03:26
  • _"My question is, how is it ok to pass `*reinterpret_cast(&a)` as a reference but `static_cast(a)` isn't?"_ Because the [first expression is _lvalue_](http://eel.is/c++draft/expr.unary.op#1.sentence-1) and the [second _(p)rvalue_](http://eel.is/c++draft/expr.static.cast#1.sentence-2). _Rvalues_ cannot be bound to _non-const lvalue references_. It was already said here. What is still unclear about it? – Daniel Langr May 07 '21 at 04:18

3 Answers3

2

Is it undefined behaviour?

Yes.

Why does *reinterpret_cast<long*>(&a) compile?

Because it is well-formed. By using reinterpret_cast, you are telling the compiler that you know what you're doing, and that it's going to be correct. The compiler has no choice but to believe you. The issue is that it wasn't correct and you didn't know.

It would be fine if the address actually contained an object of type long. For example, the following would have well defined behaviour (because standard layout class and its first member are pointer-interconverible):

struct T {
    long l;
} a {42};

// no problem
inc(*reinterpret_cast<long*>(&a));

Another case where this sort of reinterepretation is allowed is narrow character types and std::byte. It's probably never used with a reference in practice, but this is technically OK:

void inc(std::byte&);
int a = 42;
// you will get the first byte
inc(*reinterpret_cast<std::byte*>(&a));
eerorika
  • 232,697
  • 12
  • 197
  • 326
  • `It would be fine if the address actually contained an object of type long`, not unless it's an lvalue, ie. just casting to a `long` won't work. – Dan May 07 '21 at 03:04
  • @Dan Well, using the addressof operator on an rvalue is going to make the program ill-formed of course (unless there is a wakky user defined overload involved). Regardless, `a` in your example is an lvalue. – eerorika May 07 '21 at 03:08
  • Then why is it undefined behaviour? I thought using `reinterpret_cast` to pass a reference like this is UB. – Dan May 07 '21 at 03:13
  • @Dan Because the `in++` has an lvalue to rvalue conversion, and the lvalue in question doesn't refer to an object of the type of the lvalue. This violates what is commonly known as "strict aliasing rule". Also, on systems where `long` is bigger than `int`, it would access outside the bounds of the storage of the `int` object. – eerorika May 07 '21 at 03:17
  • Thanks, that's what I have been asking, so then using `reinterpret_cast` here is UB? i.e the value being passed to `inc` isn't actually referencing anything? i.e `reinterpret_cast` here doesn't automatically pass a valid lvalue? – Dan May 07 '21 at 03:22
  • @Dan Using `reinterpret_cast` from pointer/reference to another is never UB. Accessing the object through the reinterpreted pointer is UB in this case. `inc isn't actually referencing anything?` It's not referencing an object of type `long`. It sort of refers to `a` as if it was a `long` (which it isn't). – eerorika May 07 '21 at 03:24
  • but is the value passed to `inc` actually a reference here? (I don't really care about the int/long difference here), I'm trying to understand if it's actually a reference? does `*reinterpret_cast(&a)` actually generate a reference to `a` here? it doesn't seem like it as my final dereferenced type is a `rvalue long` not a reference. it's just like writing `inc(5);` which shouldn't compile. – Dan May 07 '21 at 03:29
  • @Dan `but is the value passed to inc actually a reference here? ` It is an lvalue of type `long`. It doesn't matter whether it is a reference. When analysing the type of an expression, the first step is to "adjust" it to a non-reference type. The parameter of the function is an lvalue reference, and it is bound to the passed lvalue argument. It is allowed because lvalue references can bind to lvalues. – eerorika May 07 '21 at 03:33
  • `It is an lvalue of type long`, so is the compiler sort of creating a `hidden/temporary` object on the stack to store the result of `*reinterpret_cast(&a)` and pass its reference to `inc`? – Dan May 07 '21 at 03:37
  • @Dan `so is the compiler sort of creating a hidden/temporary object on the stack` No. The lvalue refers to the object named by `a` in this case. Sure, if the function call isn't expanded inline, then the reference parameter would technically be passed as an address in practice; just like a pointer would be passed. If the address wasn't passed in a register, then it would probably be stored on the execution stack. This is the same for all reference parameters regardless of the argument. – eerorika May 07 '21 at 03:41
  • I see, so the compiler is essentially smart enough to know `*reinterpret_cast(&a)` when dereferenced, is equal to just passing `a` itself as an lvalue (ignore the int/long issue here), but when doing a regular cast like `static_cast(a)` this is not a reference anymore? – Dan May 07 '21 at 03:48
  • 1
    @Dan The expression type is never a reference (after the adjustment), like I stated in earlier comment. `*reinterpret_cast(&a)` is an lvalue expression of type `long`. The lvalue reference parameter of type `long` can bind to such value. `static_cast(a)` is a prvalue of type `long`. The lvalue reference of type `long` cannot bind to such value because it is an rvalue. – eerorika May 07 '21 at 03:52
  • Ok then if it can bind to such value then why `in++;` would be UB? I mean the compiler was smart enough to convert it to a reference, what 's the issue with incrementing it? – Dan May 07 '21 at 03:55
  • @Dan `Ok then if it can bind to such value then why in++; would be UB?` I've already replied to that question https://stackoverflow.com/questions/67428388/using-reinterpret-cast-to-pass-a-value-by-reference/67428456#comment119182925_67428456 – eerorika May 07 '21 at 03:56
  • @eerorika Isn't even that dereferencing UB by itself? That is, `reinterpret_cast(&a)` isn't UB, but `*reinterpret_cast(&a)` is. – Daniel Langr May 07 '21 at 04:04
  • @DanielLangr I don't think so. – eerorika May 07 '21 at 04:06
  • ok essentially if I did `*reinterpret_cast(&a)` (int instead of long) there is no UB and everything should work fine as `*reinterpret_cast(&a)` is converted to a reference by compile automatically? the only issue here was the type difference? Originally I thought passing this cast a ref was an issue which you are saying it is not. I made an edit to the question. – Dan May 07 '21 at 04:09
  • @Dan It would be not enough. You would need to change `long` to `int` also in the function parameter. Reference-to-`long` cannot be bound to and object-of-`int`. But then, yes. You simply cannot use an object through a pointer of a different type (`long` vs `int`), with only very few exceptions. – Daniel Langr May 07 '21 at 04:13
1

Let's break it down. Create some memory for a variable

 int a = 5;

If int is 32 bits on a little endian machine, it looks like this in memory:

0x05 0x00 0x00 0x00

Then you take the address of it, which creates an int pointer, and cast it to a long pointer. Reinterpret_cast assumes you know that the memory actually contains a long.

Let's assume long is 64 bits on your machine. The pointer now points to the 8 bytes of a plus some other stuff on the stack

05 00 00 00  aa bb cc dd

Then you pass the dereferenced long pointer by reference to the inc() function. Pass by reference actually passes the pointer. So inc() gets the address of the 05 00 00 00 aa bb cc dd memory and treats it as a long, increments it, and stores 06 00 00 00 aa bb cc dd in memory. When your code looks at the value of 'a', the 4 bytes 06 00 00 00 give it the value 6.

As you should be thinking, it is a lucky coincidence that it creates the correct answer.

As has been said, this is legal code because reinterpret_cast is a directive to override type safety.

I consider reinterpret_cast to be a bad code smell. Don't use it unless you are writing bytes to binary data or a device driver. Certainly never use it because you can't get static_cast to compile.

And as for static_cast(a), this expression is just a value. In simple terms, because it is not a variable at an address, it has no address, and hence cannot be passed by reference.

  • `Then you pass the dereferenced long pointer by reference to the inc()`, so like I have been asking in other comments, `the dereferenced` value seems to be just like a regular rvalue, it's like passing the integer `5` by itself, where is the reference of it coming from? – Dan May 07 '21 at 03:40
  • I understand the int/long issue here but that's not my question. my question is, how is `*reinterpret_cast(&a)` a reference. – Dan May 07 '21 at 03:46
  • @Dan It is not a reference. It's an lvalue expression. Why do you think it is a reference? – Daniel Langr May 07 '21 at 03:54
  • @Dan, From the C++ POV, it's an lvalue (which works with the reference parameter) because the language rules say it is, and those rules are generally sensible and useful. From the computer's perspective, it can very well _pretend_ that there's actually 8 bytes there that represent a `long` by passing the address of `a` unchanged to the function and the function assumes that there's a valid `long` there. That's certainly one valid way for the compiler to respond that is doing exactly what the code says to. – chris May 07 '21 at 03:55
  • @DanielLangr sorry I made an edit to the question to make it more clear. – Dan May 07 '21 at 04:17
  • @Dan `*reinterpret_cast` is not itself a reference. However it may be passed by reference because it has an address, if that makes sense. – David Dolson May 10 '21 at 12:27
1

As for the editted question:

My question is, how is it ok to pass *reinterpret_cast<int*>(&a) as a reference but static_cast<int>(a) isn't?"

As it was already said, this is because the *reinterpret_cast<int*>(&a) expression has a value category lvalue as described in [expr.unary.op]:

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.

On the contrary, the static_cast<int>(a) expression has a category prvalue (which is rvalue at the same time) according to [expr.static.cast]:

The result of the expression static_­cast<T>(v) is the result of converting the expression v to type T. If T is an lvalue reference type or an rvalue reference to function type, the result is an lvalue; if T is an rvalue reference to object type, the result is an xvalue; otherwise, the result is a prvalue.

And, rvalues cannot be bound to non-const lvalue references (type of the function parameter).

Note that both expressions refer to the very same object, but "through" different categories. (Another case of how to change the category of an object to rvalue would be std::move(a)).

Daniel Langr
  • 22,196
  • 3
  • 50
  • 93