Why can't I static_cast between char * and unsigned char *?

Question

Apparently the compiler considers them to be unrelated types and hence reinterpret_cast is required. Why is this the rule?

I am taking the SHA-1 hash of a string. `c_str()` returns a `const char *` and the SHA-1 function takes a `const unsigned char *` as an argument. — Nick, Apr 14 '12 at 07:36
And what do you expect to happen if that string contains negative character values? — Pubby, Apr 14 '12 at 07:41
I expect any negative value `c` to become `c + 256`, as is standard in converting a signed byte to an unsigned one. Honestly, I'm just doing the conversion to compute a hash value. I don't care how they're converted, as long as they're converted the same way every time. — Nick, Apr 14 '12 at 07:46
@Nick: Converting an `char` to an `unsigned char` is a conversion. Converting `char *` to `unsigned char*` and then reading the elements _assuming_ that they have been converted when they haven't is very different. It will work on a system where the conversion doesn't actually required a change in the representation (e.g. on a two's complement system) but as that's an implementation specific assumption it's appropriate that an explicit `reinterpret_cast` is required. — CB Bailey, Apr 14 '12 at 10:38

score 47 · Accepted Answer · edited Apr 30 '15 at 10:00

47

They are completely different types see standard:

3.9.1 Fundamental types [basic.fundamental]

1 Objects declared as characters char) shall be large enough to store any member of the implementation's basic character set. If a character from this set is stored in a character object, the integral value of that character object is equal to the value of the single character literal form of that character. It is implementation-defined whether a char object can hold negative values. Characters can be explicitly declared unsigned or
signed. Plain char, signed char, and unsigned char are three distinct types. A char, a signed char, and an unsigned char occupy the same amount of storage and have the same alignment requirements (basic.types); that is, they have the same object representation. For character types, all bits of the object
representation participate in the value representation. For unsigned character types, all possible bit patterns of the value representation represent numbers. These requirements do not hold for other types. In any particular implementation, a plain char object can take on either the same values as a signed char or an unsigned char; which one is implementation-defined.

So analogous to this is also why the following fails:

unsigned int* a = new unsigned int(10);
int* b = static_cast<int*>(a); // error different types

a and b are completely different types, really what you are questioning is why is static_cast so restrictive when it can perform the following without problem

unsigned int a = new unsigned int(10);
int b = static_cast<int>(a); // OK but may result in loss of precision

and why can it not deduce that the target types are the same bit-field width and can be represented? It can do this for scalar types but for pointers, unless the target is derived from the source and you wish to perform a downcast then casting between pointers is not going to work.

Bjarne Stroustrop states why static_cast's are useful in this link: http://www.stroustrup.com/bs_faq2.html#static-cast but in abbreviated form it is for the user to state clearly what their intentions are and to give the compiler the opportunity to check that what you are intending can be achieved, since static_cast does not support casting between different pointer types then the compiler can catch this error to alert the user and if they really want to do this conversion they then should use reinterpret_cast.

edited Apr 30 '15 at 10:00

Maadiah

431
6
20

answered Apr 14 '12 at 08:26

EdChum

376,765
198
813
562

thx for stating the standard here. I don't have it available. – Tobias Langner Apr 14 '12 at 08:27
So if they're distinct types, why does the compiler allow the cast `unsigned char a = 255; char b = static_cast(a);` ? – Nick Apr 14 '12 at 08:45
2

the same reason you can static_cast from doubles to ints and the other way, what you can't do is static_cast double* to int*, the pointer types are different but you can convert from one value to another with the caveat that there may be a loss in precision – EdChum Apr 14 '12 at 09:02
hmm, seems like a silly rule. As I mentioned in my response to Tobias, it seems like the _only_ time a static_cast should be allowed on array/pointer types should be for same-width primitives, otherwise you could accidentally shoot yourself in the foot (as in the example I gave). At least with same-width primitives, nothing can go wrong. – Nick Apr 14 '12 at 09:12
@Nick It can surprise people when they learn that you cannot `static_cast` between `unsigned char*` and `char*` but it is fundamentally because they are different types, we are not surprised that you can static cast between related types like `floats` to `ints`, nor in fact unsigned char to char but `static_cast` does what it does the clear advantage over c-style casts is that you get compile time errors if you try to convert between different types like in your case, there is a related SO post: http://stackoverflow.com/questions/2473628/c-cant-static-cast-from-double-to-int – EdChum Apr 14 '12 at 09:19
in that case it's obviously wrong though, since `int` and `double` have different widths (and aren't even represented the same!), so if you cast to `double *` you could accidentally trample some memory if you do `*d = 3.14`. My use of "silly" applies only to same-width primitives. – Nick Apr 14 '12 at 09:28
@Nick well it may be silly for you but it as another example `static_cast` is not possible between `unsigned int*` and `int*` which is analogous to your question, the storage and width is the same for both but `static_cast` detects at compile time that they are both different types – EdChum Apr 14 '12 at 10:06
1

@Nick: Even with `int` and `float`. They may have the same size, but if you have an `int`, then try to read it like a `float`, then what you get depends on exactly how `float` is stored in memory. And since the specification does not *say* how `float` is stored in memory, the specification *cannot* define what the `int` looks like. Remember: The C++ standard exists to provide guarantees about what you get. In order to define this behavior, the standard would have to detail how `float` is laid out in memory, as well as how `int` is laid out in memory. – Nicol Bolas Apr 14 '12 at 20:22
1

@Nick: In short: you're confusing the concept of "what happens on real machines" with "what the specification requires." – Nicol Bolas Apr 14 '12 at 20:23
1

The specification does not say that the most significant bit has to be the sign bit, so an implementation could have it be the least significant bit. If that were the case, when the compiler was casting an unsigned char with value 1 to a signed char, it would know it needed to shift the bits left 1 to account for the sign bit. But with a unsigned char* and char*, it's not the cast's job to adjust the values at the pointer's location. It wouldn't know how many chars to adjust anyway. – Cemafor Mar 12 '15 at 22:59
The part of the standard quoted in this answer reminds me of git man page generators. – allyourcode Dec 02 '15 at 05:47
`static_cast` is a bitch though, allowing for unconditional upcast in the derivation tree. – Euri Pinhollow Jul 09 '18 at 10:45
is it better to do a `reinterpret_cast` or do a `static_cast` to void* and then to required type? – Osman-pasha Apr 26 '22 at 05:56

Tobias Langner · Answer 2 · 2012-04-14T11:02:38.157

you're trying to convert unrelated pointers with a static_cast. That's not what static_cast is for. Here you can see: Type Casting.

With static_cast you can convert numerical data (e.g. char to unsigned char should work) or pointer to related classes (related by some inheritance). This is both not the case. You want to convert one unrelated pointer to another so you have to use reinterpret_cast.

Basically what you are trying to do is for the compiler the same as trying to convert a char * to a void *.

Ok, here some additional thoughts why allowing this is fundamentally wrong. static_cast can be used to convert numerical types into each other. So it is perfectly legal to write the following:

char x = 5;
unsigned char y = static_cast<unsigned char>(x);

what is also possible:

double d = 1.2;
int i = static_cast<int>(d);

If you look at this code in assembler you'll see that the second cast is not a mere re-interpretation of the bit pattern of d but instead some assembler instructions for conversions are inserted here.

Now if we extend this behavior to arrays, the case where simply a different way of interpreting the bit pattern is sufficient, it might work. But what about casting arrays of doubles to arrays of ints? That's where you either have to declare that you simple want a re-interpretation of the bit patterns - there's a mechanism for that called reinterpret_cast, or you must do some extra work. As you can see simple extending the static_cast for pointer / arrays is not sufficient since it needs to behave similar to static_casting single values of the types. This sometimes needs extra code and it is not clearly definable how this should be done for arrays. In your case - stopping at \0 - because it's the convention? This is not sufficient for non-string cases (number). What will happen if the size of the data-type changes (e.g. int vs. double on x86-32bit)?

The behavior you want can't be properly defined for all use-cases that's why it's not in the C++ standard. Otherwise you would have to remember things like: "i can cast this type to the other as long as they are of type integer, have the same width and ...". This way it's totally clear - either they are related CLASSES - then you can cast the pointers, or they are numerical types - then you can cast the values.

I acknowledged in my opening post that the compiler says they're unrelated pointers. What I want to know is _why_. It seems to me that if `T1` is "related" to `T2`, then `T1 *` should be "related" to `T2 *`. Why isn't that typing rule sound (for primitive types)? — Nick, Apr 14 '12 at 07:49
@Nick it is not "somehow related" but "related classes". As you said, char and unsigned char are primitives - not classes. That's the reason - and that's what I said if you read carefully. You are right - if class T1 is related to class T2, then you can use static_cast to convert T1* to T2*. This is not what you are doing. char is not related to unsigned char in the sense of relation required by the C++ standard. — Tobias Langner, Apr 14 '12 at 08:24
However, if you drop the pointer, the compiler has no issue casting between any primitive types, e.g. `unsigned char a = 255; char b = static_cast(a);` It seems a bit strange, since if `T1` and `T2` are classes, the cast between pointers is not sound, since you could do something like: `class A;` `class B : public A;` `B *b = new B[4];` `b[0] = B();` `A *a = static_cast(b);` `a[1] = A();` `B b1 = b[1]; // oops` It seems like the _only_ time the cast should be safe is between primitive types. — Nick, Apr 14 '12 at 08:37
yes - and the behavior for that is clear and well defined. You have some rules on how to convert one numerical type into the other. This may be as easy as copying the bit pattern into the new memory or as complicated as converting a double to an int. Still -it's only for one value and thus easily definable. See my explanation above. The static_cast has 2 totally differnt uses depending on whether the type is a pointer type or not. They should have named it differntly for the 2 use cases (e.g. static_cast & numerical_cast). — Tobias Langner, Apr 14 '12 at 10:59
You can also static_cast void pointers to any pointers and back — glades, Aug 29 '22 at 19:07

score 4 · Answer 3 · answered Apr 14 '12 at 10:31

Aside from being pointers, unsigned char * and char * have nothing in common (EdChum already mentioned the fact that char, signed char and unsigned char are three different types). You could say the same thing for Foo * and Bar * pointer types to any dissimilar structures.

static_cast means that a pointer of the source type can be used as a pointer of the destination type, which requires a subtype relationship. Hence it cannot be used in the context of your question; what you need is either reinterpret_cast which does exactly what you want or a C-style cast.

Why can't I static_cast between char * and unsigned char *?

3 Answers3

Linked