12

For the last day I have had a nasty bug in my code which after some searching appears to be related to comparison between char values and hex. My compiler is gcc 4.4.1 running on Windows. I replicated the problem in the simple code below:

char c1 = 0xFF; char c2 = 0xFE;
if(c1 == 0xFF && c2 == 0xFE)
{
    //do something
}

Surprisingly the code above does not get in the loop. I have absolutely no idea why and would really appreciate some help on this. It is so absurd that the solution must be (as always) a huge mistake on my part that I totally overlooked.

If I replace the above with unsigned chars it works, but only in some cases. I am struggling to find out what's going on. In addition if I cast the hex values to char in comparison it enters the loop correctly like so:

if(c1 == (char)0xFF && c2 == (char)0xFE)
{
    //do something
}

What does that mean? Why can it be happening? Isn't the raw hex value interpreted as a char by default? For the curious the point in my code where I first noticed it is comparison of first 2 bytes of a stream with the above hex value and their reverse to idenity the Byte Order Mark.

Any help is appreciated

Lefteris
  • 3,196
  • 5
  • 31
  • 52

5 Answers5

12

Plain char can be signed or unsigned. If the type is unsigned, then all works as you'd expect. If the type is signed, then assigning 0xFF to c1 means that the value will be promoted to -1 when the comparison is executed, but the 0xFF is a regular positive integer, so the comparison of -1 == 0xFF fails.

Note that the types char, signed char and unsigned char are distinct, but two of them have the same representation (and one of the two is char).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • I see, yeah thought about that but what would then be the best(Safest) way to be comparing bytes from within a char array to hex values? Casting every char to unsigned? Or something more secure? – Lefteris Jan 13 '12 at 07:28
  • You have a few options. You could cast the hex values to `(char)`, and the compiler would then convert them the same way as it converts the `char` variable on the other side of the comparison. Or you can cast the `char` variables to `(unsigned char)`. Note that converting to `(unsigned)` would be ill-advised; `-1` converted to `unsigned` on a machine with 4-byte `int` would generate 0xFFFFFFFF and not 0xFF. Another option would be to have an array `char charmap[256]` initialized with `charmap[i] = i;`, and then comparing `c1 == charmap[0xFE]`. Or you could just use `unsigned char`. – Jonathan Leffler Jan 13 '12 at 07:34
  • I see, thanks for your detailed answer Jonathan. Has been helpfull. Time to scour my code for any such comparison and correct it, before it comes back to bite me in the ass several months down the line. – Lefteris Jan 13 '12 at 07:41
  • @JonathanLeffler are you sure that -1 (signed 0xFF) expands to 0xFFFFFFFF on a 4-byte machine when converted to unsigned? Try `signed char x = 0xFF; printf("%08x\n", x); printf("%08x\n", (unsigned char)x);`. I think @paxdiablo explains it the correct way in his comment. – abdus_salam Sep 04 '15 at 11:32
  • @abdus_salam: Yes, I'm sure. Further, when I try your code fragment, I get two lines of output: `ffffffff` and `000000ff`, which is what I'd expect. (What do you expect? What do you see when you try it?) Note that the argument to `printf()` is converted to `int` (because all variants of `char` are promoted to `int` in a call to a variadic function like `printf()`), and then internally to `printf()`, the formats indicate that the argument should be treated as `unsigned int`. – Jonathan Leffler Sep 04 '15 at 13:56
  • @JonathanLeffler If my Friday afternoon square eyes are missing something apologies but yes, that is what I see and expect too and also what seems to disagree with your original comment. The second printf statement casts x to 'unsigned int' and we get `000000FF` instead of all `FF`s as you suggest. My point is, it is safe to cast a signed char to unsigned as long as you don't care about the sign of the original value. Would you disagree? – abdus_salam Sep 04 '15 at 14:41
  • @abdus_salam: I think we have a misunderstanding about the meaning of the words, then. I confess I'm not sure what the problem is, nor which statements to exegesize. Maybe you can send me an email and you can annotate which of my statements you're having problems with (copy'n'paste), and I can explain whatever it is when I know more precisely where you think I've misstated something. In your first comment, you mention 'converted to unsigned'; if you're addressing my first comment (to Lefteris), then there I am referring to `(unsigned)-1` with no `char` types involved. – Jonathan Leffler Sep 04 '15 at 14:51
7

When comparing char to hex you must be careful:

Using the == operator to compare a char to 0x80 always results in false?

I would recommend this syntax introduced in C99 to be sure

if(c1 == '\xFF' && c2 == '\xFE')
{
    // do something
}

Avoid the cast, it's unnecessary and isn't type safe.

It tells the compiler that the 0xFF is a char rather than an int, this will solve your issue.

clang compiler will warn you about this also:

comparison of constant 128 with expression of type 'char' is always false [-Werror,-Wtautological-constant-out-of-range-compare]

ericcurtin
  • 1,499
  • 17
  • 20
4

The literal 0xff is not a char, it's a int (signed). When you shoehorn that into a char variable, it will fit okay but, depending on whether your char type is signed or unsigned, this will affect how it's upgraded in expressions (see below).

In an expression like if (c1 == 0xff), the c1 variable will be promoted to an integer since that's what 0xff is. And what it's promoted to depends on whether it's signed or not.

Bottom line, you can do one of two things.

  1. Ensure that you use signed char so that it "sign-extends" to the correct int. By that I mean an unsigned char 0xff would become (for a 4-byte int) 0x000000ff (so it's still 255) but a signed one would become 0xffffffff (so it's still -1).

  2. Shoehorn the literal into the same type as the variable, which you're already doing with (char)oxff.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Thanks paxdiablo, nice answer. I was wondering what is in your opinion the safest method to follow, guarranteed to be working everywhere? I will need to change various places in my code now in the light of the realization of this mistake and I just don't want to come back to it in the future and notice that it breaks all of my code in some special cases – Lefteris Jan 13 '12 at 07:36
  • 1
    @Lefteris: in terms of the code we've seen _either_ method will work fine. That's not to say you don't have some assumptions _elsewhere_ that may be affected by your choice. Personally, I prefer my `char` variables to be unsigned then use constants like 255U (to make sure they match in signedness). But it's six of one and half a dozen of the other. – paxdiablo Jan 13 '12 at 07:40
  • Thanks. Unfortunately I did not catch this error early in the project so I have lots of fixing to do. – Lefteris Jan 13 '12 at 07:55
  • @Lefteris: yes you do. Good luck with that :-) – paxdiablo Jan 13 '12 at 07:57
0

I solved it by casting my variables to UINT16 (optimized for my compiler). In your case, you would be casting c1 and c2 as INT

char c1 = 0xFF; char c2 = 0xFE;
if((int)c1 == 0xFF && (int)c2 == 0xFE)
{
    //do something
}
Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
siggyt
  • 9
0

The character 0xFE will he translated into a negative integer. The constants in the expression will be translated into positive integers.

Ed Heal
  • 59,252
  • 17
  • 87
  • 127
  • 0xFE is a positive number. If it assigned to a `char` and if `char` is a signed type, then when the assigned value is promoted to an `int` for the comparison, then...but it all hinges on `char` being signed, which is not automatically the case (though it is, indeed, the case to the OP). – Jonathan Leffler Jan 13 '12 at 07:31