0

So lately, I read on an issue regarding the three distinct types in C, char/unsigned char/signed char. The problem that I now encounter is not something I have experienced up till now (my program works correctly on all tested computers and only targets little-endian (basically all modern desktops and servers using Windows/Linux right?). I frequently reuse a char array I defined for holding a "string" (not a real string of course) as temporary variables. E.g. instead of adding another char to the stack I just reuse one of the members like array[0]. However, I based this tactic on the fact that a char would always be signed, until I read today that it actually depends on the implementation. What will happen if I now have a char and I assign a negative value to it?

char unknownsignedness = -1;

If I wrote

unsigned char A = -1;

I think that the C-style cast will simply reinterpret the bits and the value that A represents as an unsigned type becomes different. Am I right that these C-Style casts are simply reinterpretation of bits? I am now referring to signed <-> unsigned conversions.

So if an implementation has char as unsigned, would my program stop working as intended? Take the last variable, if I now do

if (A == -1)

I am now comparing a unsigned char to a signed char value, so will this simply compare the bits not caring about the signedness or will this return false because obviously A cannot be -1? I am confused what happens in this case. This is also my greatest concern, as I use chars like this frequently.

user209347
  • 217
  • 3
  • 11
  • 2
    What casting? There's no casting anywhere. And if you want to store binary (non-character) data use the better (IMO) `int8_t` or `uint8_t` standard types. – Some programmer dude Jan 01 '15 at 18:16
  • My implementation does not support C99, i thought int8_t were for that version – user209347 Jan 01 '15 at 18:18
  • And if I assign a signed value to an unsigned value, will there be no casting? Or does casting only happen when the data sizes do not match? – user209347 Jan 01 '15 at 18:19
  • 2
    There's automatic conversion, not casting. Casting is when you write something like `(unsigned char) -1` – Barmar Jan 01 '15 at 18:19
  • If indeed as in the example above I assign a negative value to an unsigned data type, what happens? Are the bits reinterpreted? Or are the bits themselves changed? And also what happens in the comparison (my last example in the question). – user209347 Jan 01 '15 at 18:21
  • "C-Style casts". What other casts are there in C? – n. m. could be an AI Jan 01 '15 at 18:25
  • Yes c-style casts I just mean casts in C in general ;p, so in the examples I gave in the questions, there should be castings right> – user209347 Jan 01 '15 at 18:31

4 Answers4

4

The following code prints No:

#include <stdio.h>

int
main()
{
    unsigned char a;

    a = -1;

    if(a == -1)
        printf("Yes\n");
    else
        printf("No\n");

    return 0;
}

The code a = -1 assigns an implementation-defined value to a; on most machines, a will be 255. The test a == -1 compares an unsigned char to an int, so the usual promotion rules apply; hence, it is interpreted as

`(int)a == -1`

Since a is 255, (int)a is still 255, and the test yields false.

jch
  • 5,382
  • 22
  • 41
  • "implementation-defined value" That would be `UCHAR_MAX` in all conforming implementations. – n. m. could be an AI Jan 01 '15 at 18:38
  • I think I see it: I have the go back to my code and change all the char somevariable = negative value and similar comparisons, as in case char is implemented as unsigned, the program will stop working correctly in some cases. Am I right? – user209347 Jan 01 '15 at 18:38
4
unsigned char a = -1;

ISO/IEC 9899:1999 says in 6.3.1.3/2:

if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type

We add (UCHAR_MAX+1) to -1 once, and the result is UCHAR_MAX, which is obviously in range for unsigned char.

if (a == -1)

There's a long passage in 6.3.1.8/1:

If both operands have the same type, then no further conversion is needed.

Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.

Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.

Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.

Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

The rank of unsigned char is less than that of int.

If int can represent all the values that unsigned char can (which is usually the case), then both operands are converted to int, and the comparison returns false.

If int cannot represent all values in unsigned char, which can happen on rare machines with sizeof(int)==sizeof(char), then both are converted to unsigned int, -1 gets converted to UINT_MAX which happens to be the same as UCHAR_MAX, and the comparison returns true.

jch
  • 5,382
  • 22
  • 41
n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243
3
unsigned char A = -1;

results in 255. There is no reinterpretation upon assignment or initialization. A -1 is just a bunch of 1 bits in two's complement notation and 8 of them are copied verbatim.

Comparisons are a bit different, as the literal -1 is of int type.

if (A == -1)

will do a promotion (implicit cast) (int)A before comparison, so you end up comparing 255 with -1. Not equal.

And yes, you have to be careful with plain char.

SzG
  • 12,333
  • 4
  • 28
  • 41
  • 2
    "results in 255". That's if your `char` is 8 bits. Not guaranteed. "8-bit machine, where `sizeof(int) == sizeof(char)`". `int` is at least 16 bits. – n. m. could be an AI Jan 01 '15 at 18:35
  • Agreed. I assumed char is 8 bit. Has anyone ever seen something different? Edited. – SzG Jan 01 '15 at 18:42
  • "Has anyone ever seen something different?" Possible duplicate: http://stackoverflow.com/questions/2098149/what-platforms-have-something-other-than-8-bit-char – n. m. could be an AI Jan 01 '15 at 19:13
-1

I think this question is best answered by a quick example (warning: C++, but see explanation for my reasoning):

char c = -1;
unsigned char u = -1;
signed char s = -1;
if (c == u)
        printf("c == u\n");
if (s == u)
        printf("s == u\n");
if (s == c)
        printf("s == c\n");
if (static_cast<unsigned char>(s) == u)
        printf("(unsigned char)s == u\n");
if (c == static_cast<char>(u))
        printf("c == (char)u\n");

The output:

s == c
(unsigned char)s == u
c == (char)u

C is treating the values differently when used as-is, but you are correct in that casting will just reinterpret the bits. I used a C++ static_cast here instead to show that the compiler is okay with doing this casting. In C, you would just cast by prefixing the type in parenthesis. There is no compiler checking to ensure that the cast is safe in C.

greg
  • 4,843
  • 32
  • 47