1

Is the following implementation-defined:

char *cp = "\x96\xA0\xB4\xBE\xC8";

and as well as:

std::string = "\x96\xA0\xB4\xBE\xC8";

The char means signed char on my compiler (MSVC 2015).

I figured that I can't do the following:

unsigned char *cp = "\x96\xA0\xB4\xBE\xC8";

"\x96\xA0\xB4\xBE\xC8" are bytes in range 0 - 255, hence my question is:

Does the above depend on the compiler?

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185
user963241
  • 6,758
  • 19
  • 65
  • 93

1 Answers1

2

Is the following implementation-defined:

signed char *cp = "\x96\xA0\xB4\xBE\xC8";

and as well as:

std::string = "\x96\xA0\xB4\xBE\xC8";

On systems with 8-bits wide signed char, yes. A hex escape sequence in a narrow string literal has an implementation-defined value if it falls outside of the implementation-defined range defined for char. Assuming 8-bit signed char, any hex value greater than 7F is outside the range of representable values.

Whether that literal is used to initialise a std::string or a pointer to character is irrelevant in this regard.


You can use an array of unsigned char instead of a string literal:

static constexpr unsigned char cp[] = {
    0x96,
    0xA0,
    0xB4,
    0xC8,
};

You can use this array to initialise a std::basic_string<unsigned char> if you need it:

 std::basic_string<unsigned char> s = {std::begin(cp), std::end(cp)};

P.S. Conversion from string literal to non-const char pointer is ill-formed (since C++11; prior the conversion was well-formed but deprecated).

P.P.S char, unsigned char and signed char are always three distinct types whether char is signed or not.

Community
  • 1
  • 1
eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Is it possible to make this well-defined and not dependent on compiler? E.g. I couldn't do this: `unsigned char *cp = "\x96\xA0\xB4\xBE\xC8";` is it necessary? – user963241 May 28 '19 at 18:25
  • @user963241 that doesn't help with making the value of the string literal independent of the compiler. You can use an array instead, as shown in my edit. – eerorika May 28 '19 at 18:32
  • Yes, what I mean is signedness is implementation-defined and also how the value e.g. '\x96' is interpreted is also implementation-defined, but if we do `unsigned char c = '\x96'` then it is well-defined. – user963241 May 28 '19 at 18:33
  • 1
    @user963241 Yes, to each statement. Although the correct syntax is `unsigned char c = 0x96`; – eerorika May 28 '19 at 18:34
  • There exists no computer system where a `char` is less than 8 bits or is not two's complement. It looks like C++20 will finally [acknowledge that](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r0.html). – rustyx May 28 '19 at 18:36
  • So when people are using `char *c` with such hex string, aren't they making their code dependent on compiler? Should I try to use `unsigned char` with such hex string? – user963241 May 28 '19 at 18:37
  • @user963241 Yes. You should use an array of `unsigned char`, rather than a string literal if you wish to use values greater than 0x7F and be independent of implementation. – eerorika May 28 '19 at 18:39
  • @rustyx "There exists no computer system where a char is less than 8 bits" - That's plain *not true*: https://stackoverflow.com/a/2215694/5910058 - Uncommon, yes. Nonexistent, no. – Jesper Juhl May 28 '19 at 18:42
  • @rustyx curiously they did not propose to make the hex escape sequences outside the range of `char` well defined, which I suppose they could have proposed given that 2's complement is guaranteed. At least the linked proposal doesn't, nor the latest draft. – eerorika May 28 '19 at 18:43
  • It seems its not possible to do that with `std::string` i.e. be independent of implementation, right? because it will not use `unsigned char`? – user963241 May 28 '19 at 18:45
  • @user963241 I'm not 100% sure about `std::string`. Possibly in C++20 given the guarantee of 2's complement. In that case the values may be in different range than you expect though. `std::basic_string` is fine for sure; I've added an example. – eerorika May 28 '19 at 18:56