-1

Often in C++, one has a parameter void* user_data that one can use to pass an arbitrary type.

I used this to pass an array of booleans. However, I had a bug where I cast from bool* --> void* --> int* and I got weird results. Here is an example.

#include <iostream>

int main() {
    bool test[2] = { };
    void *ptr = static_cast<void*>(test);
    std::cout << static_cast<bool*>(ptr)[0] << '\n';
    std::cout << static_cast<int*>(ptr)[0] << '\n';
    std::cout << static_cast<int>(test[0]) << '\n';
}

Output:

$ g++ int_bool.cpp 
$ ./a.out 
0
-620756992
0

Can someone explain to me what the problem is? Normally when I cast from bool to int, there is no problem: false maps to 0 and true maps to 1. Clearly, that's not the case here.

phuclv
  • 37,963
  • 15
  • 156
  • 475
thedoctar
  • 8,943
  • 3
  • 20
  • 31
  • 6
    Well you’re not allowed to cast `bool*` ➝ `void*` ➝ `int*` so it’s not surprising you got weird results. – Konrad Rudolph Oct 11 '20 at 13:18
  • 6
    There is no language "C/C++". Choose one or the other — not both — unless you're asking about a difference between the languages. Since the code is manifestly C++, I've removed the C tag. – Jonathan Leffler Oct 11 '20 at 13:20
  • 5
    Undefined behavior – EOF Oct 11 '20 at 13:20
  • Does this answer your question? [What is the strict aliasing rule?](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule) – Aykhan Hagverdili Oct 11 '20 at 14:19
  • @JonathanLeffler To my amateur eyes, the differences between the two are irrelevant for this question. – thedoctar Oct 11 '20 at 15:21
  • @AyxanHaqverdili I think the question is too far removed to be obviously an answer. I'm not an expert programmer. – thedoctar Oct 11 '20 at 15:23
  • 1
    Your illustrative code is pure C++. Your use of casts is pure C++. Essentially nothing in the code is C (the line containing `main()` and the one containing `}` are valid in C). If you want to pretend that the question is about C too, use the C subset of C++ in your code. C and C++ are vastly different languages. Don't try treating them as ”almost the same”. – Jonathan Leffler Oct 11 '20 at 15:25
  • @JonathanLeffler I understand the point you are trying to make. I don't think it's relevant in this example, because my question is only about type-casting. The difference in syntax is IMHO superfluous. But thanks for your feedback. – thedoctar Oct 11 '20 at 21:10

1 Answers1

4

static_cast<int*>(ptr)[0] casts ptr to int* and reads the first element. Since the original array is only 2 bytes, you're reading outside it (because you're reading a 4-byte int) and invokes undefined behavior, unless int is a 2-byte type on your system. You're also violating the strict aliasing rule by accessing a type using a different pointer type which also invokes UB. Besides you'll get UB if the bool array isn't properly aligned. On x86 it doesn't cause any problems because x86 allows unaligned access by default but you'll get a segfault on most other architectures

static_cast<int>(test[0]) OTOH converts test[0] (which is a bool) to int and is a completely valid value conversion.


Update:

The type int* refers to a pointer whose object is 4-bytes long, whereas bool* refers to a pointer whose object is 2-bytes long

No. When dereferencing a variable var, an amount of memory of length sizeof(var) will be read from memory starting from that address and treat as the value of that variable. So *bool_ptr will read 1 byte and *int_ptr will read 4 bytes from memory (if bool and int are 1 and 4-byte types respectively)

In your case the bool array contains 2 bytes, so when 4 bytes is read from static_cast<int*>(ptr), 2 byte inside the array and 2 bytes outside the array are read. If you declared bool test[4] = {}; (or more elements) you'll see that the int* dereferencing completes successfully because it reads all 4 bools that belong to you, but you still suffer from the unalignment issue

Now try changing the bool values to nonzero and see

bool test[4] = { true, false, true, false };

You'll quickly realize that casting a pointer to a different pointer type isn't a simple read in the old type and convert to the new type like a simple value conversion (i.e. a cast) but a different "memory treatment". This is essentially just a reinterpret_cast which you can read to understand more about this problem

I don't understand what you are saying about char*. You're saying casting from any type to char* is valid?

Casting from any other pointer types to char* is valid. Read the question about strict aliasing rule above:

You can use char* for aliasing instead of your system's word. The rules allow an exception for char* (including signed char and unsigned char). It's always assumed that char* aliases other types.

It's used for things like memcpy where you copy the bytes representing a type to a different destination

bool test[4] = { true, true, true, true };
int v;
memcpy((char*)&test, (char*)&v, sizeof v);

Technically mempcy receives void*, the cast to char* is just used for demonstration

See also

phuclv
  • 37,963
  • 15
  • 156
  • 475
  • 1
    In addition to the size of the original array and reading outside of it... it's also *undefined behavior* to dereference the `bool*` as an `int*`, regardless. – Eljay Oct 11 '20 at 13:31
  • 1
    “and invokes undefined behavior, unless int is a 2-byte type on your system” — This is **incorrect**. As Eljay said, this is *always* UB, regardless of type sizes. – Konrad Rudolph Oct 11 '20 at 13:36
  • @Eljay it's in the "violating the aliasing rule" part above – phuclv Oct 11 '20 at 13:49
  • [What is the strict aliasing rule?](https://stackoverflow.com/q/98650/10147399) – Aykhan Hagverdili Oct 11 '20 at 14:19
  • Thanks, this is what I was looking for. I find it confusing though, that one can convert bool to int but not bool* to int*. (Of course since it was cast to void* along the way, it doesn't know sizeof I guess). – thedoctar Oct 11 '20 at 15:22
  • @thedoctar `bool` to `int` is a *value* conversion so obviously there's no trouble in doing that. But `bool*` and `int*` refers to the *underlying memory representation* of the objects, and you can only access any type's representation by `char*`. Doing that with any other pointer types violate strict aliasing rule – phuclv Oct 11 '20 at 15:37
  • @phuclv Thanks you, I think I understand now what you are saying. The type int* refers to a pointer whose object is 4-bytes long, whereas bool* refers to a pointer whose object is 2-bytes long. I don't understand what you are saying about char*. You're saying casting from any type to char* is valid? That is very confusing! – thedoctar Oct 11 '20 at 21:09
  • @phuclv Thanks! Lots of info, hope to learn and digest. – thedoctar Oct 12 '20 at 12:05