8

I am working on some embedded device which has SDK. It has a method like:

MessageBox(u8*, u8*); // u8 is typedefed unsigned char when I checked

But I have seen in their examples calling code like:

MessageBox("hi","hello");

passing char pointer without cast. Can this be well defined? I am asking because I ran some tool over the code, and it was complaining about above mismatch:

messageBox("Status", "Error calculating \rhash");
diy.c  89  Error 64:  Type mismatch (arg. no. 1) (ptrs to signed/unsigned)
diy.c  89  Error 64:  Type mismatch (arg. no. 2) (ptrs to signed/unsigned)


Sometimes I get different opinions on this answer and this confuses me even more. So to sum up, by using their API the way described above, is this problem? Will it crash the program?

And also it would be nice to hear what is the correct way then to pass string to SDK methods expecting unsigned char* without causing constraint violation?

  • Could you please running [this code](http://ideone.com/AeQTQ7) on your system, and see if you get 0 or -128? – Sergey Kalinichenko May 29 '15 at 19:12
  • @dasblinkenlight: I would love to but unfortunately I don't have access to the device now :( –  May 29 '15 at 19:14
  • Some compilers let you force the default to "signed" or "unsigned".. see this answer for more details: http://stackoverflow.com/questions/2054939/is-char-signed-or-unsigned-by-default – Buddy May 29 '15 at 19:14
  • OK then, on systems where the example prints 0 this will compile and run; on systems where it prints -128 it will issue a warning or an error. – Sergey Kalinichenko May 29 '15 at 19:15
  • @dasblinkenlight: I got neither error nor warning (error comes from online lint). I was more curious along the lines of undefined behaviour if it is /is not –  May 29 '15 at 19:16
  • 3
    Does the embedded device use CodeWarrior as its compiler? I vaguely recall an option for strings to be unsigned chars in that compiler. – StilesCrisis May 29 '15 at 19:17
  • Most modern C compilers have a setting to force `char` to be `unsigned` by default. It is advisable to force this behaviour and specify `signed char` explicitly where you really need signed 8 bit semantics. – chqrlie May 29 '15 at 19:20
  • @dasblinkenlight: You can check my other question, for string functions I think it is ok –  May 29 '15 at 19:22
  • @chqrlie: they have special typedefs like s8, u8 but I am not sure about char signedness yet –  May 29 '15 at 19:22
  • @dasblinkenlight: you can see from my other question it seems for *string functions* this kind of pointer passing trickery is ok –  May 29 '15 at 19:31
  • Not just for *string functions*, for any function that expects `char *` or `const char *`, you can pass pointers to the `unsigned` kind and vice versa. It is bad practice and error prone, but it is well defined, and if the API you are stuck with uses `u8*`, you may not have much of a choice. – chqrlie May 29 '15 at 19:35
  • @chqrlie: I have to check but I believe api everywhere uses u8*... well defined I meant that there is no undefined behaviour etc.. (when you say error prone it still confuses me why?) –  May 29 '15 at 19:37
  • No undefined behaviour indeed. It is error prone because it forces confusion between different C types that should be used for different types of data. I recommend using `char *` or `const char *` for C strings: pointers to arrays of characters terminated by a `'\0'` byte. I reserve `unsigned char *` for pointers to raw 8 bit data that may or may not be `'\0'` terminated and therefore should not be passed carelessly to most *string functions*. – chqrlie May 29 '15 at 19:41
  • @chqrlie: I agree with you when to use char* and when to use unsigne char*. But as can be seen from that other question is seems it is ok to use unsigned char array with string functions... I ended up using unsigned char array for string also because I thought in future I might need unicode (e.g., utf8 - since some values in utf8 string can be more than 128). You see? Hence my situation now (this current question is slightly still other situation though I believe) –  May 29 '15 at 19:49
  • @dasblinkenlight: "OK then, on systems where the example prints 0 this will compile and run; on systems where it prints -128 it will issue a warning or an error.". why? –  May 29 '15 at 19:58
  • @User30015 whether or not a `char` is signed is system-dependent. The program is a way to check if `char` is signed or not. – Sergey Kalinichenko May 29 '15 at 20:17
  • @dasblinkenlight: I meant why on one system it will run and on other issue error? (Ok anyway I am getting confused now by so many discussion on this question and also the other one I asked) –  May 29 '15 at 20:20
  • @User30015 Systems are allowed to make `char` signed or unsigned. On systems where regular `char` is the same as `unsigned char` your code will compile. On systems where regular `char` is the same as `signed char` your code will trigger a warning or an error. – Sergey Kalinichenko May 29 '15 at 20:22
  • @dasblinkenlight: In my case I got neither warning nor an error ; and I doubt char is unsigned on this system –  May 29 '15 at 20:29
  • @User30015 You'll never know until you try it. – Sergey Kalinichenko May 29 '15 at 20:31
  • @dasblinkenlight: Yes I will, but if char is signed you say it is undefined behaviour? –  May 29 '15 at 20:32
  • @User30015 If char is signed, standard-compliant compiler should issue an error. – Sergey Kalinichenko May 29 '15 at 20:34
  • @dasblinkenlight: so if your original code prints -128 char is signed; I think highly likely though that char is signed on that device –  May 29 '15 at 20:35
  • If this is being compiled in CodeWarrior, as suggested above, then the function could be expecting Pascal strings, where the first byte is the length, instead of C-style null-terminated strings. If that's the case, then you would need to call something like p2cstrcpy() to convert the strings. – Dan Korn May 29 '15 at 21:52
  • Isn't this essentially the same question as http://stackoverflow.com/q/30535814/827263? – Keith Thompson May 30 '15 at 02:46

2 Answers2

4

It is a constraint violation, so technically it is not well defined, but in practice, it is not a problem. Yet you should cast these arguments to silence these warnings. An alternative to littering your code with ugly casts is to define an inline function:

static inline unsigned char *ucstr(const char *str) { return (unsigned char *)str; }

And use that function wherever you need to pass strings to the APIs that (mistakenly) take unsigned char * arguments:

messageBox(ucstr("hi"), ucstr("hello"));

This way you will not get warnings while keeping some type safety.

Also note that messageBox should take const char * arguments. This SDK uses questionable conventions.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • 1
    How is it well-defined, when it fails to compile? [link](http://ideone.com/r2OO4L) – Sergey Kalinichenko May 29 '15 at 19:14
  • those warnings come from external tool like lint –  May 29 '15 at 19:14
  • @dasblinkenlight: you have c instead of s –  May 29 '15 at 19:15
  • 1
    I don't think that hiding the cast under 'cute' small function is the best way to do it, it's a bad habit to get. The users should always be aware of what type they are currently using – pascx64 May 29 '15 at 19:19
  • @dasblinkenlight: hmm.. this got me puzzling.. what to do now? –  May 29 '15 at 19:20
  • @User30015 Look up compiler setting to force character literals to be of unsigned type. Talk to the guys who supplied the API: there must be a reason behind this (I cannot even guess what that reason might be). – Sergey Kalinichenko May 29 '15 at 19:24
  • @pascx64: I beg to differ! It is a bad habit to use explicit casts where a safe conversion wrapper does the job correctly. casts are inherently unsafe. Sometimes you really need them to perform explicit conversions, but you lose type safety. The wrapper will cause a warning if you pass a pointer to something that is not a `char`, the explicit cast will not. – chqrlie May 29 '15 at 19:25
  • @dasblinkenlight: So there is a chance that this way of passing char* which I brought inside my question is WELL defined? because I have used it in my code –  May 29 '15 at 19:25
  • @chqrlie: So you say it is WELL defined and the cast is NOT necessary? (the warnings are from lint) –  May 29 '15 at 19:26
  • @User30015: more precisely, I say it is WELL defined, but you should enable most compiler warnings to catch other bugs and you should silence the ones you mention by using a type conversion wrapper as shown. You probably compile with only default warnings enabled and do not get complaints. That's OK but you might be missing other bugs. – chqrlie May 29 '15 at 19:30
  • @chqrlie: I see about other bugs I was not sure how to increase warning level that is why I am using online flex lint now to check code. So this specific case you say is OK? –  May 29 '15 at 19:33
  • It is OK, and you can silence `lint` warnings by using the trick above. – chqrlie May 29 '15 at 19:36
  • That will silence the compiler warnings and errors, but it may not behave as expected at runtime. If this is being compiled in CodeWarrior, as suggested above, then the function could be expecting Pascal strings, where the first byte is the length, instead of C-style null-terminated strings. If that's the case, then you would need to call something like c2pstrcpy() to convert the strings. – Dan Korn May 29 '15 at 21:53
  • 1
    @Dan Korn: If the function expects Pascal strings, of course it will not work, but the OP says in a related question that the production code seems to function properly. I suspect the function just supports non ASCII charsets and the API designer thought it was smart to specify that these *extended* chars are meant to be unsigned. That's just an assumption, and a poor decision if I'm right. – chqrlie May 29 '15 at 22:13
  • @chqrlie: Indeed it can be messagebox received non ascii encoded text, say UTF8 –  May 29 '15 at 22:18
  • When you say use cast to silence the tool -lint, will it just silence the tool, or solve the problem and it will not be constraint violation anymore? Because on one hand SDK requires me u8* as string, so what should I pass to it to be completely Safe? On the other hand I might need to store utf8 in string in the future, so will using plain char* be OK for that? –  May 29 '15 at 22:19
  • @User30015: Yes plain `char *` is the recommended type for C nul terminated character strings, plain ASCII or utf-8 encoded. utf-8 was carefully designed for use in plain C character strings. Your SDK functions expect `u8*`, you can pass pointers to `char` or `unsigned char` arrays safely, as long as they otherwise conform to what the API expects regarding `'\0'` termination, maximum length, encoding, constness, etc. – chqrlie May 29 '15 at 22:29
  • @chqrlie: I think we are getting more clear picture of situation. 1. I have sdk functions which expect u8* as text - question is what to pass to it and how without having to wonder whether I trigger undefined behaviour or not? how to correctly pass text to MessageBox function for example and be sure 100% there is no undefined behaviour? –  May 29 '15 at 22:39
  • 1
    @User30015: what is your target device? what architecture does it run on? what compiler are you using? There is no such thing as 100% sure. Whether you pass a `char *` or an `unsigned char *` really has no impact. But what if the array contains a 1 MB string? I cant tell if that will trigger undefined behaviour somewhere deep inside the implementation of messageBox() . – chqrlie May 29 '15 at 22:54
  • chrqlie is correct, there are no guarantees. With a string of 8-bit chars, there will often be ambiguities about the encoding. Is it supposed to be UTF-8? Latin-1? Windows-1252? Mac Roman? Shift-JIS? Big5? The type alone tells you nothing in this regard. If the type were, say, unsigned short or wchar_t, it's probably expecting UCS-2 or UTF-16 (Unicode), but even that's not guaranteed (and you can get into endianness issues). And that's just encoding; there can be ambiguities about line endings, entities, spaces, etc. Welcome to the many-layered Tower of Babel. – Dan Korn May 29 '15 at 23:42
  • @User30015: As I have written many times already, passing a `char *`, or even a `const char *` as you do, to the APIs that expect `unsigned char *` is sloppy but will not cause any problem per se. The compiler *should* issue a warning but will still generate harmless code. Such a warning seems disabled by default in your environment: it encourages sloppiness, but no harmful consequences. You *could* add casts or inline type wrappers, but these are not necessary and you might introduce new bugs by doing that, so **don't worry about it and leave the code as is**. This is my final answer. – chqrlie May 30 '15 at 09:16
  • @giorgi: If the function does not modify the string pointed to by a `char *` argument, it should really be declared and defined as taking a `const char *` argument. This tells the compiler that it is safe to pass `const char *` pointers as arguments to said function. Otherwise, the compiler should issue a warning about the loss of constness, that is the real problem if the function indeed modifies the string. String literals should always be considered `const` because they usually reside in read only segments of memory and in any case should not be modified. – chqrlie Jun 15 '15 at 19:49
  • @giorgi: for compatibility with old coding practices, compilers usually do not complain about this, and let the programmer be sloppy about keeping track of constness. It is a common source of bugs, sometimes hard to find. Passing a `const char *` to a function expecting a `char *` does not pose a problem unless the function actually tries to modify the string. – chqrlie Jun 15 '15 at 19:51
  • @giorgi: `const char*` and `char*` are passed the same way. If the function does not modify the string, it will be OK, but sloppy. – chqrlie Jun 15 '15 at 20:21
  • 1
    @giorgi: a `const char *` is a pointer to an array of bytes terminated by a NUL byte, that must not be modified. If the function that receives this pointer as an argument does not modify the contents of the array and handles the byte values outside the range `0..127` correctly (or if no bytes are outside this range), then it is OK. I don't know how to explain this more clearly without a white board or some memory dumps. – chqrlie Jun 15 '15 at 20:58
0

The problem comes down to it being implementation-defined whether char is unsigned or signed.

Compilers for which there is no error will be those for which char is actually unsigned. Some of those (notably the ones that are actually C++ compilers, where char and unsigned char are distinct types) will issue a warning. With these compilers, converting the pointer to unsigned char * will be safe.

Compilers which report an error will be those for which char is actually signed. If the compiler (or host) uses an ASCII or similar character set, and the characters in the string are printable, then converting the string to unsigned char * (or, better, to const unsigned char * which avoids dropping constness from string literals) is technically safe. However, those conversions are potentially unsafe for implementations that use different character sets OR for strings that contain non-printable characters (e.g. values of type signed char that are negative, and values of unsigned char greater than 127). I say potentially unsafe, because what happens depends on what the called function does - for example does it check the values of individual characters? does it check the individual bits of individual characters in the string? The latter is, if the called function is well designed, one reason it will accept a pointer to unsigned char *.

What you need to do therefore comes down to what you can assume about the target machine, and its char and unsigned char types - and what the function is doing with its argument. The most general approach (in the sense that it works for all character sets, and regardless of whether char is signed or unsigned) is to create a helper function which copies the array of char to a different array of unsigned char. The working of that helper function will depend on how (and if) you need to handle the conversion of signed char values with values that are negative.

Peter
  • 35,646
  • 4
  • 32
  • 74
  • This isn't really accurate, `char` and `unsigned char` are still different types even if plain `char` is unsigned. The standard mandates that the compiler has to issue a diagnostic message. – M.M May 30 '15 at 07:57
  • There's what the standard says, and what compilers do, Matt. For whatever reason, not all C compilers where `char` is `unsigned` diagnose (beyond a warning, which is often optional and disabled by default) passing a `char *` where an `unsigned char *` is expected. – Peter May 30 '15 at 08:06
  • @MattMcNabb: What do you say then? I will try to check CHAR_MIN when I get access to device to check if char is signed. But like I said if I am not wrong in many their functions they pass char* to that message box directly -and I need to know for sure because I have used that SDK for my software. They (and also I) even pass unsigned char pointers to string methods like strcpy, sprintf, etc. without casts; also to atoi, see my other question. So is there possibility all these operations are OK and don't crash the program? –  May 30 '15 at 08:14
  • @Peter Some compilers (e.g. borland c++) allow `char *` to be passed to a function expecting `unsigned char *` with no warnings – M.M May 30 '15 at 09:06
  • @User30015 well, the fact of the matter is that the SDK has given you a lame API so you have to work with it. You'll have to have some ugly casts or macros in your code. – M.M May 30 '15 at 09:07
  • @User30015: As I have written many times already, passing a `char *`, or even a `const char *` as you do, to the APIs that expect `unsigned char *` is sloppy but will not cause any problem per se. The compiler *should* issue a warning but will still generate harmless code. Such a warning seems disabled by default in your environment: it encourages sloppiness, but no harmful consequences. You *could* add casts or inline type wrappers, but these are not necessary and you might introduce new bugs by doing that, so **don't worry about it and leave the code as is**. This is my final answer. – chqrlie May 30 '15 at 09:10