4

Does the following code rely on undefined behavior if the platform's char type is signed and some of the parameters are in the negative range (for example, char_bitmatch('\xf0', '\xc0', '\x20'))?

static constexpr bool char_bitmatch(char c, char pos, char neg)
{
    return (c & pos) == pos
        && !(c & neg);
}

Context

The reason I am asking this is because in GCC 8.1.0 with -O3, I am seeing a behavior which can only be caused by char_bitmatch('\xf0', '\xc0', '\x20') erroneously returning true. This code behaves as expected:

static constexpr bool char_bitmatch(char c_in, char pos_in, char neg_in)
{
    auto c   = static_cast<unsigned char>(c_in);
    auto pos = static_cast<unsigned char>(pos_in);
    auto neg = static_cast<unsigned char>(neg_in);

    return (c & pos) == pos
        && !(c & neg);
}

From my understanding, this should not have fixed the issue -- & should work the same between signed char and unsigned char.

This leads me to a few conclusions (but I don't know which is correct):

  1. Use of unsigned char fixes an undefined behavior.
  2. I am still relying on undefined behavior -- the "fix" is luck and the actual bug lies elsewhere in my code.
  3. There is a bug in GCC 8.1.0 optimization -- the "fix" is a voodoo incantation that makes GCC do the right thing.
Travis Gockel
  • 26,877
  • 14
  • 89
  • 116
  • So every time you use `&` the operands get promoted -- in this code, they become `int`. Promotion from signed `char` to `int` and `unsigned char` to `int` behave quite differently. – Ben Voigt Jun 03 '18 at 22:41
  • @BenVoigt Looking at the code, does that affect the results here? – Paul Sanders Jun 03 '18 at 23:10
  • @Paul: I don't think so, but it is a counterpoint to the question's claim that "`&` should work the same between signed `char` and `unsigned char`". – Ben Voigt Jun 04 '18 at 00:33
  • @Ben OK, understood. I found your comment educational (I've never stopped to think about it), and it led me to [this](https://stackoverflow.com/questions/17489696/bitwise-shift-promotes-unsigned-char-to-int). – Paul Sanders Jun 04 '18 at 05:04

1 Answers1

1

Interesting. I think your assumption that char_bitmatch is returning true is, erm, false.

When I run this code:

#include "stdio.h"

static constexpr bool char_bitmatch(char c, char pos, char neg)
{
    return (c & pos) == pos
        && !(c & neg);
}

int main (void)
{
    constexpr bool b = char_bitmatch ('\xf0', '\xc0', '\x20');
    printf ("%d\n", b);
}

I get:

0

So I think the problem lies elsewhere in your code.

I used the same compiler as you - run it at Wandbox (choice of compilers available).

Also, == pos is redundant, no?

Paul Sanders
  • 24,133
  • 4
  • 26
  • 48
  • 1
    `== pos` is not redundant if the test is "all of `pos` bits are set (and none of `neg` bits). Leaving out the `== pos` would result in a rule of "any of `pos` bits are set ...." – Ben Voigt Jun 04 '18 at 05:10
  • I also see this on independent recreations. This is tricky, since the issue only happens in release builds and doing things like adding print statements also makes the problem go away. My assumption has only been verified through GDB -- `length == 2` [here](https://github.com/tgockel/json-voorhees/blob/master/src/jsonv/char_convert.cpp#L227). – Travis Gockel Jun 04 '18 at 05:13
  • @Ben Ah yes, you right, I was thinking of _one_ bit, sorry. Too early in the morning for me obviously, I regularly use that construct myself. – Paul Sanders Jun 04 '18 at 05:14
  • @Travis Hmmm, lots of code there. Enough to crash my tablet, but I have now satisfied myself that the `char_bitmatch` works as intended with signed chars, even though I would not personally code it that way and even though I appear not to be fully awake yet! – Paul Sanders Jun 04 '18 at 05:23