3

When I have char holding some integer (say 23), and want to convert it to larger integer (int), I heard there maybe some issues, since the compiler must decide whether to interpret char as signed or unsigned? Is this true? Can there be problems due to this? And how to avoid this?

In other words(I am not sure if below formulation is equivalent to above), what problems can occur from such conversion:

   char someCharVal = //...
    int x = someCharVal;

and how to avoid them?

ps. explanation for "dummies" welcome

  • 1
    The answer to this question might be system-dependent. Take a look at this question: http://stackoverflow.com/questions/2054939/is-char-signed-or-unsigned-by-default – merlin2011 Apr 07 '14 at 06:39
  • 1
    `char` may be signed or unsigned. `CHAR_MAX` is guaranteed to be at least `127`, so `23` is always positive. To avoid any issues, just bear this possibility in mind when writing code. Prefer to use `unsigned char` if your intent is to store a small non-negative number. – M.M Apr 07 '14 at 06:41
  • @merlin2011: that links says that `char` maybe signed or unsigned-depends on implementation. But I think it does not fully answer this question. How to convert `char` to `int` safely then? –  Apr 07 '14 at 06:42
  • @dmcr_code, If by "safe" he means he wants to preserve the sign of the integer, then he may have check first to see whether it is signed, and decide which type of integer to place it in? – merlin2011 Apr 07 '14 at 06:44

2 Answers2

6

The problem is, plain and simple, sign extension when incorrectly treating unsigned values as signed ones.

Let's examine the bit patterns for 5 and -5 in both 8-bit and 16-bit two's complement numbers:

      8-bit          16-bit
    =========  ===================
+5  0000 0101  0000 0000 0000 0101
-5  1111 1011  1111 1111 1111 1011

When converting a number from 8-bit to 16-bit, the top bit is extended to the left. In other words, a zero-bit at the left of an 8-bit number will extend to the top half of the 16-bit number.

Similarly, a one-bit in that top bit will extend to the left.

This is the way C widens it's signed numbers (for two's complement anyway, the ones' complement and sign-magnitude encodings are a different matter but few implementations use them nowadays).

So, if you are converting signed char to signed int, or unsigned char to unsigned int, there is no problem. C will give you the correct value.

The problem exists when you switch to or from signed types to the other. , and the problem is that the underlying data may be treated differently from what you may expect.

See, for example, the following code, with 8-bit char and 32-bit int types:

#include <stdio.h>

int main (void) {
    printf ("unsigned char  50 -> unsigned int %11u\n", (unsigned char)50);
    printf ("unsigned char -50 -> unsigned int %11u\n", (unsigned char)-50);
    printf ("unsigned char  50 ->   signed int %11d\n", (unsigned char)50);
    printf ("unsigned char -50 ->   signed int %11d\n", (unsigned char)-50);

    printf ("  signed char  50 -> unsigned int %11u\n", (  signed char)50);
    printf ("  signed char -50 -> unsigned int %11u\n", (  signed char)-50);
    printf ("  signed char  50 ->   signed int %11d\n", (  signed char)50);
    printf ("  signed char -50 ->   signed int %11d\n", (  signed char)-50);

    return 0;
}

The output of this shows the various transformations, with my annotations:

unsigned char  50 -> unsigned int          50
unsigned char -50 -> unsigned int         206 # -50 unsigned is 256-50
unsigned char  50 ->   signed int          50
unsigned char -50 ->   signed int         206 # same as above
  signed char  50 -> unsigned int          50
  signed char -50 -> unsigned int  4294967246 # sign extend, treat as unsigned
  signed char  50 ->   signed int          50                      (2^32 - 50)
  signed char -50 ->   signed int         -50

The first unusual case there is the second line. It actually takes the signed char -50 bit value, treats that as an unsigned char, and widens that to an unsigned int, preserving correctly its unsigned value of 206.

The second case does the same thing since a signed int is more than capable of holding the full range of unsigned char values (in this implementation).

The third unusual case widens -50 to a signed int and then treats the underlying bit pattern as an unsigned int, giving you the large positive value.

Note that there are no issues when the "signedness" of the value does not change.

The C standard doesn't mandate what signedness the char type has by default, it could be signed or unsigned. So, if you want truly portable code, it shouldn't contain any "naked" char types.

If you want to work with signed values, use signed values. That includes explicitly using signed char instead of char. Likewise, if you want to use unsigned value, use unsigned everywhere (including explicitly with unsigned char). Don't promote from signed to unsigned or vice versa unless you absolutely know what will happen.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • @paxdaiblo: if my char holds `-23`. When I do `(unsigned char) charVal` and assign it to `int`. The int will hold a positive value right? So the original (negative) value has been lost. Isn't it? Or I am migging something –  Apr 07 '14 at 06:46
  • 1
    I am quoting a book from which I heard about it, and got bit confused what was the problem: "The issue becomes important only when converting a char quantity to a larger integer. Going the other way, the results are well-defined: excess bits are simply discarded. But a compiler converting a char to an int has a choice: should it treat the char as a signed or an unsigned quantity? If the former, it should expand the char to an int by replicating the sign bit; if the latter, it should fill the extra bit positions with zeroes." –  Apr 07 '14 at 06:52
  • @dmcr_code, you're missing nothing. If you want to preserve sign, don't cast. C will sign-extend. The only problem is if you want to (for example) use the 8-bit char to populate the lower 8 bits of an int and nothing else. If the char is signed and negative, it will sign extend into the upper parts of the int. – paxdiablo Apr 07 '14 at 06:54
  • I am not sure I exactly understood *what kind of problem* does the book refer to. PS. Here is some additional quote if it makes things more clear: "The results of this decision are important to virtually anyone who deals with characters with their high-order bits turned on. It determines whether 8-bit characters are going to be considered to range from –128 through 127 or from 0 through 255. This, in turn, affects the way the programmer will design things like hash tables and translate tables." –  Apr 07 '14 at 06:56
  • does the book say it is `not` OK to do this kind of thing: `int x = someCharVal;`? –  Apr 07 '14 at 06:58
  • @dmcr_code, I've updated the answer with some more detail on the problem. Bottom line, it's only a problem if you're switching between signed and unsigned types. – paxdiablo Apr 07 '14 at 07:10
  • Here is also another quote: "If you care whether a character value with the high-order bit on is treated as a negative number, you should probably declare it as unsigned char. Such values are guaranteed to be zero-extended when converted to integer, whereas ordinary char variables may be signed in one implementation and unsigned in another.". To make it more clear what problem I mean. Ok paxdiablo so far I think I am not following very well what you mean - maybe I lack some background .... –  Apr 07 '14 at 07:12
  • maybe you can simplify your explanation, and first say what is the *problem* –  Apr 07 '14 at 07:15
  • I don't want to promote. Sometimes I might need in code smth like `int x=charVal;` I am curious what kind of problems this may result in? And how to avoid this? That's it. –  Apr 07 '14 at 07:21
  • this is getting confusing. I want to convert usually to `signed int`. Of course if source char is `signed`, I will not store it as `unsigned char`. But then it depends if `char` is `signed` or `unsigned` on the system right? I think it will be nice if you can simplify your explanation, I have also updated my question –  Apr 07 '14 at 07:34
  • @dmcr_code: I've made it about as simple and explanatory as I can in this latest iteration. I suggest you input the code and examine the output. And, if you want to be portable, don't use `char` on its own. – paxdiablo Apr 07 '14 at 07:34
  • I've updated the question such that answering it might be easier, in terms of explanation - maybe you can look at it - otherwise thanks anyway –  Apr 07 '14 at 07:36
  • If my underlying `char` is `unsigned` and I want to convert it to `signed int` -> is this problem? (assuming I don't store negative values inside). –  Apr 07 '14 at 07:49
  • Say `char` is unsigned on my machine. And I say: `int t = charVal;` Then if `charVal` contains positive value, there will be NO issues right? (assuming I will not - which of course I won't in case char is unsigned - assign negative value to `charVal`) –  Apr 10 '14 at 10:52
  • @dmcr_code, assuming int is wider than char, there will be no issue, no. – paxdiablo Apr 10 '14 at 11:02
  • the issue has to be somewhere else. Say char is signed on my machine. If I assign negative value to it - then reassigning this to int(as in question) is OK right? Of course I should never assign such negative value to unsigned int. And, if char is unsigned I should never assign negative value to it in the first place - and in such case doing assignment like in my question afterwards should be OK - is it right judgement? –  Apr 15 '14 at 07:18
  • The only time there is problem in second part of your code is when you assign -50 to unsigned int - but why on earth one would do that? and expect correct result? –  Apr 15 '14 at 10:35
0

For signed char, the range of an int is always equal to or larger than the range of a signed char and conversion from signed char to an int is always safe.

For unsigned char, in theory UCHAR_MAX can be equal to UINT_MAX and less than INT_MAX; and it's possible for conversion from unsigned char to an int to be unsafe. For this to happen UCHAR_MAX must be 32767 or larger (which is very rare in practice); therefore the conversion is almost always safe.

Because char can be either signed or unsigned, conversion from char to int is only almost always safe (and not guaranteed to be safe in theory).

However..

All of the above assumes you're using the full range of a (signed or unsigned) char. This is extremely rare. Typically if you're using char you only use values from 0 to 127 to avoid portability problems, and if you need to store negative values or larger positive values you use different data type to begin with (e.g. signed char, uint8_t, int, etc). If a char is only used to store values from 0 to 127 then converting char to an int is always safe regardless of what values CHAR_MIN and CHAR_MAX have.

Brendan
  • 35,656
  • 2
  • 39
  • 66