20

I use utf8 and have to save a constant in a char array:

const char s[] = {0xE2,0x82,0xAC, 0}; //the euro sign

However it gives me error:

test.cpp:15:40: error: narrowing conversion of ‘226’ from ‘int’ to ‘const char’ inside { } [-fpermissive]

I have to cast all the hex numbers to char, which I feel tedious and don't smell good. Is there any other proper way of doing this?

SwiftMango
  • 15,092
  • 13
  • 71
  • 136

4 Answers4

36

char may be signed or unsigned (and the default is implementation specific). You probably want

  const unsigned char s[] = {0xE2,0x82,0xAC, 0}; 

or

  const char s[] = "\xe2\x82\xac";

or with many recent compilers (including GCC)

  const char s[] = "€";

(a string literal is an array of char unless you give it some prefix)

See -funsigned-char (or -fsigned-char) option of GCC.

On some implementations a char is unsigned and CHAR_MAX is 255 (and CHAR_MIN is 0). On others char-s are signed so CHAR_MIN is -128 and CHAR_MAX is 127 (and e.g. things are different on Linux/PowerPC/32 bits and Linux/x86/32 bits). AFAIK nothing in the standard prohibits 19 bits signed chars.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • Why the second one does not need to specify sign? – SwiftMango Oct 31 '13 at 19:58
  • also the second one gives me weird sign... I tried `\xe2\x82\xac` and it works... – SwiftMango Oct 31 '13 at 20:00
  • I would prefer a way that doesn't specify the signedness of the `char`s, but just uses the implementation's default. The second one appeals to me more than the first. – John Oct 31 '13 at 20:00
  • Trivial: Not only may `char` be `signed` or `unsigned` -- it also may be niether, and just be `char`. That there are *three* variants of `char` rather than just the usual two bit me quite recently. – John Dibling Oct 31 '13 at 20:05
  • @JohnDibling: are you sure of that? I always have thought that `char` may be `signed` or `not` but there is no a 3rd possibility? – Basile Starynkevitch Oct 31 '13 at 20:07
  • 1
    @John If you do not specify the signedness of `char`, you are using the compiler's default ... which can (and likely will) change between different compiler vendors (or even different versions of the same compiler). When you need a `char` to be a `byte`, you should declare it as such (and not make assumptions about what the compiler may or may not do. – Zac Howland Oct 31 '13 at 20:10
  • 2
    @BasileStarynkevitch: Yes, just a few days ago I spent a good while in the depths of the Standard to figure out why my code wasn't working, and I came across this gem from which I realized I needed three overloads, not two. Reference from C++03: 3.9.1 Fundamental types "1/ [...] Plain char, signed char, and unsigned char are three distinct types. [...]" – John Dibling Oct 31 '13 at 20:10
  • @JohnDibling Wouldn't a "plain char" be *either* a `signed char` or an `unsigned char` depending on the compiler's options? – Zac Howland Oct 31 '13 at 20:13
  • Anyone explains why the second one does not need to specify sign? – SwiftMango Oct 31 '13 at 20:14
  • @JohnDibling There are three distinct types, but plain `char` must have the same representation as either `signed char` or `unsigned char`. Which sometimes makes for ambiguities in discussions: when one says, `char` may be either signed or unsigned, I would interpret that as meaning "the representation of a plain `char` may be the same as either that of a `signed char` or that of an `unsigned char`. – James Kanze Oct 31 '13 at 20:14
  • 1
    @ZacHowland: The same clause goes on to say that, "In any particular implementation, a plain char object can take on either the same values as a signed char or an unsigned char; which one is implementation-defined." So `char` isn't the same as `signed char` or `unsigned char`, but they are so close on a fundamental level that in 15 years of programming C++ professionally I only needed to distiguish between them *once*. – John Dibling Oct 31 '13 at 20:15
  • 1
    Just my personal opinion, but from a stylistic point of view, if it is text, use `char`. I've tried to use `unsigned char` in the past (because I often have to deal with accented characters): it just doesn't work (because so many functions expect `char*` or `std::string`, and string literals are `char[]`), and it confuses the reader. – James Kanze Oct 31 '13 at 20:18
  • @JohnDibling Interesting. I have to say that in my almost 13 years of professional experience, I have never seen the need to distinguish between a "plain char" and a "(un)signed char". So in the case you mentioned (your overloads), you had to have a `void f(char)`, `void f(signed char)`, *and* `void f(unsigned char)`? – Zac Howland Oct 31 '13 at 20:18
  • @ZacHowland: In other words, a "plain char" would be internally represented in the same way as either `signed char` or `unsigned char`, and so be in almost every way the *same* as `signed char` or `unsigned char`, but from the compiler's point of view they are distinct types. Like I said originally, this is really little more than an observation of trivia. – John Dibling Oct 31 '13 at 20:19
  • @JamesKanze That is actually what the `std::wstring` and `wchar_t` types fix :) – Zac Howland Oct 31 '13 at 20:19
  • 1
    @ZacHowland: I predict in two years you'll have to write a third overload for something. But then yo'll be good for another 15 years. :) – John Dibling Oct 31 '13 at 20:20
  • @JohnDibling Understood. I'm just curious what the use case for needing 3 overloads was. With any luck, in 2 years I won't be writing much code (at least not code that I don't want to write). I've started the transition into PM ... – Zac Howland Oct 31 '13 at 20:20
  • Also: while I like the `char const s[] = "\xe2\x82\xac";` solution, you need to be careful. Something like `char const s[] = "\xc3\xa5con";` will _not_ do what you want (and may cause a compiler errro) because the second character will be interpreted as `\xa5c`. You can avoid this by breaking up the string (`"\xc3\xa5" "con"`), or by using `"\u00c3\u00a5con"`; `\u` is always four hex characters. – James Kanze Oct 31 '13 at 20:22
  • @ZacHowland: It was some super-complicated space-age template stuff I was working on. I had some explicit instantiations for `char` and `signed char` but none for `unsigned char`, and something didn't compile correctly for something declared explicitly as `unsigned char`. I doubt I could replicate it if I tried. – John Dibling Oct 31 '13 at 20:23
  • @ZacHowland If you have `void f( char const* )`, you cannot pass it a `unsigned char[]` without a `reinterpret_cast`. And if you have to interface any legacy code, that's likely to be a problem. – James Kanze Oct 31 '13 at 20:23
  • @ZacHowland Re `wstring` et al: but at what cost? I tend to use UTF-8 in plain `char`, and it works fine for what I do, modulo a few precautions (like remembering to cast the characters to `unsigned char` if I'm passing them as an argument to one of the functions in ``). But then, I was dealing with these sort of problems before there was a `wchar_t`, so I sort of know what I have to look out for. – James Kanze Oct 31 '13 at 20:26
  • @JamesKanze In this example, in order to store `0xE2` in a `signed char`, you would need to do `-1 * (128 - E2)`, which would match the bit pattern, but would just be weird to create (unless you are always doing `"\xe2"`?). – Zac Howland Oct 31 '13 at 20:33
  • @ZacHowland Even that's not guaranteed; it will only work on a 2's complement machine. (Admittedly, there aren't that many machines left that aren't 2's complement.) – James Kanze Nov 01 '13 at 10:16
0

The short answer to your question is that you are overflowing a char. A char has the range of [-128, 127]. 0xE2 = 226 > 127. What you need to use is an unsigned char, which has a range of [0, 255].

unsigned char s = {0xE2,0x82,0xAC, 0};
Zac Howland
  • 15,777
  • 1
  • 26
  • 42
  • So by default if there is no specifier, a char is signed? – SwiftMango Oct 31 '13 at 20:09
  • 2
    No, on some implementations a `char` is unsigned and `CHAR_MAX` is 255 (and `CHAR_MIN` is 0). On others `char` are `signed` so `CHAR_MIN` is -128 and `CHAR_MAX` is 127 (and e.g. things are different on Linux/PowerPC/32 bits and Linux/x86/32 bits). – Basile Starynkevitch Oct 31 '13 at 20:09
  • 2
    @texasbruce It is up to the compiler. On many compilers, the default is `signed`. If you *need* an `unsigned`, you should always specify it explicitly. – Zac Howland Oct 31 '13 at 20:11
0

While it may well be tedious to be putting lots of casts in your code, it actually smells extremely GOOD to me to use as strong of typing as possible.

As noted above, when you specify type "char" you are inviting a compiler to choose whatever the compiler writer preferred (signed or unsigned). I'm no expert on UTF-8, but there is no reason to make your code non-portable if you don't need to.

As far as your constants, I've used compilers that default constants written that way to signed ints, as well as compilers that consider the context and interpret them accordingly. Note that converting between signed and unsigned can overflow EITHER WAY. For the same number of bits, a negative overflows an unsigned (obviously) and an unsigned with the top bit set overflows a signed, because the top bit means negative.

In this case, your compiler is taking your constants as unsigned 8 bit--OR LARGER--which means they don't fit as signed 8 bit. And we are all grateful that the compiler complains (at least I am).

My perspective is, there is nothing at all bad about casting to show exactly what you intend to happen. And if a compiler lets you assign between signed and unsigned, it should require that you cast regardless of variables or constants. eg

const int8_t a = (int8_t) 0xFF; // will be -1

although in my example, it would be better to assign -1. When you are having to add extra casts, they either make sense, or you should code your constants so they make sense for the type you are assigning to.

Mike Layton
  • 93
  • 1
  • 6
  • While the stronger type checking is probably good for catching bugs, it causes a lot of hurt for projects which have to deal with legacy code. Initializing `char` arrays from hex constants spanning `0x00-0xFF` is quite common, cases in point: the [X Bitmap (XBM) file format](https://en.wikipedia.org/wiki/X_BitMap) (which is actually a snippet of C source code with precisely such an itialization), along with many X library functions dealing with gradients, color maps, etc. which expect arrays of `char`s, not arrays of `unsigned char`s. – ack Jan 02 '17 at 14:27
0

Is there a way to mix these? I want a define macro FX_RGB(R,G,B) that makes a const string "\x01\xRR\xGG\xBB" so I can do the following: const char* LED_text = "Hello " FX_RGB(0xff, 0xff, 0x80) "World"; and get a sting: const char* LED_text = "Hello \x01\xff\xff\x80World";

KungPhoo
  • 516
  • 4
  • 18