0

I'm compiling a low level code using many bunch of bytes. In some case it is handy for me to define then using the double quote enclosed old C strings.

But when compiling with gcc or g++ (don't know behavior with other compilers), it keeps bothering me with sign of pointed string.

Basically when I write this

const unsigned char & refok = *"ABCDEFGHI";

EDIT: ok, the code above is not really working as it will in theory just keep a reference to a copy of the first char of the string. It actually allow access to all the string with some compilers because of optimization, but may break any time.

or this

const unsigned char oktoo[10] =
    {'A','B','C','D','E','F','G','H','I',0};

the compiler doesn't say anything.

But it definitely reject this one:

const unsigned char * bad = "ABCDEFGHI";

with message

error: invalid conversion from 
   ‘const char*’ to ‘const unsigned char*’
   [-fpermissive]

It's not even a warning, it's an error.

I'm wondering why this one should be more of an issue than when using a reference, or converting individual chars from signed chars to unsigned chars ? Or am I missing something ?

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
kriss
  • 23,497
  • 17
  • 97
  • 116
  • 1
    Your question is tagged both C and C++ for a topic on which C and C++ differ A LOT. Please remove the tag for the language that you aren't using. – Pascal Cuoq Nov 14 '11 at 14:25
  • @Complicatedseebio - Since it doesn't look like he's in a hurry to do that, you could just add in a C answer, with an explanation that it is for C and not C++ (like the accepted answer is, I believe). – T.E.D. Nov 14 '11 at 22:59
  • @Complicatedseebio: yes, a C answer would be nice. As anybody can see looking at the bogus reference above my proposal is written using C++. Part of my problem is that the exact same code is valid when compiled with C and a compile error when using C++. – kriss Nov 14 '11 at 23:11

2 Answers2

8

I think you're missing a lot of things!

The first line probably does something completely different from what you think. (It involves a conversion and extension-of-lifespan of a temporary.)

The second line initializes each unsigned char from the corresponding char in the brace initializer.

In the third line, the compiler is correct: the string literal has type const char *, and you cannot convert a T* to a U* in general.

Note that the standard demands explicitly that char, unsigned char and signed char be distinct types. The reasoning here is that char should be the platform's native byte type, while the other two are explicitly unsigned and signed integral types. The unsigned/signed types are for algebraic operations, while the naked type is for interfacing with the system (e.g. command line arguments, and file I/O).

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • Interesting. How safe is using `reinterpret_cast ` to get around it? – T.E.D. Nov 14 '11 at 14:02
  • I think that's OK. Type punning rules explicitly exempt all flavours of char pointers. (Just make sure to maintain constness, but you've done that.) In fact, I do this all the time, because I want my data to be `unsigned char`, but I do read it from and write it to files. – Kerrek SB Nov 14 '11 at 14:03
  • @kriss: read it again: the native type is **not** signed, it is *unspecified* and up to the platform. All the standard demands is that the three char types be *distinct*, and hence their pointers are not compatible. There's nothing more one can say about standard requirement other than "that's how the language requires it". – Kerrek SB Nov 14 '11 at 14:11
  • Native char isn't explicitly signed or unsigned. You will get same error with const signed char * bad = "ABCDEFGHI"; It's some kind of "may be signed and this is not your business" char. – blaze Nov 14 '11 at 14:14
  • @kriss: ... that said, the *types* (not their pointers!) are convertible, to the extent to which any integral types are convertible. You'll have to pay attention to signedness issues there, of course. E.g. converting an unsigned char to a signed char may be undefined (not just unspecified) behaviour if the value is too large; while converting signed to unsigned is always well-defined. That should really explain everything that you observe, but feel free to ask more specific questions. – Kerrek SB Nov 14 '11 at 14:16
  • OK, signed char * is also incompatible, as you said. Actually why I first tried to do that was converting from char * to uint8_t * (from ). The initial problem being for me that char does not offer any guarantee it will be 8 bits and is less expressive, but builtin string notation is still nice. But I guess on any platform where char isn't 8 bits, builtin strings would be useless anyway. – kriss Nov 14 '11 at 14:48
  • @kriss: It all depends on what you want to do. `char` is guaranteed to be *at least* 8 bit, and the same is true for `uint8_t`. If you have a concrete problem to solve, post that and we'll see. – Kerrek SB Nov 14 '11 at 15:07
  • I'm just building network packets by hand and my code just won't work if uint8_t is not **exactly** 8 bits. I hope you are kidding when stating uint8_t is *at least* 8 bits. If it's actually so my code just won't work on any such target. – kriss Nov 14 '11 at 22:58
  • char and unsigned char can hold 0…255 , signed char can only hold -128…127 . The binary digits are the same, but assigning to other types or casting is done differently. Even arithmetic has to be considered carefully, as there is no overflow or underflow warning or correction, and this can happen easily with only 8 bits. u ± s where u is unsigned and s is signed is clearly defined, just follow the binary, but with surprises for the unwary. –--copied from **http://stackoverflow.com/questions/35803605/is-accessing-an-array-element-using-a-char-undefined-behaviour** – Arif Burhan Mar 04 '16 at 19:22
  • C++ compiler gives stronger warnings, but no runtime warnings, unless you explicitly code for them. – Arif Burhan Mar 04 '16 at 19:24
4

Implicit conversions between numerical types are allowed; that is what the first two are doing. Implicit conversions between different pointer types are not allowed (apart from converting a derived-class pointer to a base-class pointer).

Note that the first does not give a reference to the first character of the array. It creates a temporary copy of that character, converted to type unsigned char, and binds the reference to that, extending the lifetime of the temporary to that of the reference.

The second converts each char in the initialiser list to an unsigned char array element.

The third attempts to convert const char * to const unsigned char *; since char and unsigned char are distinct types, implicit pointer conversion is not allowed.

Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
  • OK, so I guess the first version is just bogus and works only when accessing subsequent chars because compiler is optimizing away the reallocation of temporary. It should definitely be avoided. Henceforth, if I want to use unsigned chars for bunch of data, I'm just stuck with syntaxic noise. – kriss Nov 14 '11 at 14:16