21

I am using C++ in native mode with Visual Studio 2017. That compiler compiles the statement below without complaint:

const char * AnArrayOfStrings[]  = {"z1y2x3w4", "Aname"};

However, if I change the above statement to specify that char is signed or unsigned, the compiler emits a C2440 error. For instance, the statements below, do not compile:

const signed   char * AnArrayOfStrings2[] = {"z1y2x3w4", "Aname"};

const unsigned char * AnArrayOfStrings2[] = {"z1y2x3w4", "Aname"};

I fail to see the reason for the compiler refusing to compile the statement when the sign of char is made explicit.

My question is: is there a good reason that I have failed to see for the compiler refusing to compile those statements ?

Thank you for your help (I did research in StackOverflow, the C++ documentation, I used Google and, consulted about a dozen C/C++ books in an effort to find the answer myself but, a reason still eludes me.)

Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
ScienceAmateur
  • 521
  • 4
  • 11
  • 4
    Fyi, [you may find this interesting](https://stackoverflow.com/questions/436513/char-signed-char-char-unsigned-char). – WhozCraig Mar 07 '18 at 06:10
  • fyi, with `gcc` and `clang` I get a bunch of warnings about this, with `g++` and `clang++` I get a bunch of errors. – yano Mar 07 '18 at 06:10
  • 7
    Remember that `char`, `signed char` and `unsigned char` are three distinct types not just 3 variations of the same type. – Jesper Juhl Mar 07 '18 at 06:38
  • @WhozCraig: yes, it was interesting. It's one of the posts I read before asking the question. It's obvious why signed char and unsigned char are different types. They will be used differently. What's a mystery to me is why a third "type", neither signed nor unsigned is necessary, what is gained/avoided, if anything by the "plain type" ?. It seems to only cause problems without solving any. – ScienceAmateur Mar 07 '18 at 06:47
  • 6
    Your question is purely about C++ language which is totally separate language than C. And C/C++ does not exist. It's either C or C++. If a book really uses "C/C++" in the title, you should probably dump it immediately. Please do not spam with adding unrelated language tags. – Gerhardh Mar 07 '18 at 07:04
  • @Gerhardh: it certainly isn't my intention to spam. I included the C tag because most C programmers are using C++ compilers and will be affected by the C++ compiler behavior. – ScienceAmateur Mar 07 '18 at 07:07
  • 5
    Really? I don't know any C programmer using a C++ compiler. If you use a compiler in C mode, it is no longer a C++ compiler. If you use a C++ compiler the laws of C++ apply, and you are programming C++. – Gerhardh Mar 07 '18 at 07:10
  • @Gerhardh: the fact that you don't know any doesn't mean there aren't any. I know a number of them. – ScienceAmateur Mar 07 '18 at 07:14
  • 1
    Neither does the fact that you know a number of them make them "most of them". And strictly speaking, as soon as you use a C++ compiler you must use C++ language rules. Try `char *new = malloc (100);` in a C++ compiler. If you use C++ language rules with a C++ compiler, you are typically not called a C programmer any longer. – Gerhardh Mar 07 '18 at 07:19
  • @Ajay Brahmakshatriya: often, "good" C programmers take advantage of the stricter type checking afforded by C++. Avoiding bugs is one of the characteristics of good programmers regardless of language used. – ScienceAmateur Mar 07 '18 at 07:21
  • 3
    @ScienceAmateur: Things will make a hell of a lot more sense if you realize that this is just a matter of awful naming, and the character types have nothing to do with the integer types: `signed char` is a signed *byte*, and `unsigned char` is an unsigned *byte*, `char` is a narrow *"character"*, `wchar_t` is a wide *"character"*. – user541686 Mar 07 '18 at 09:38
  • It is true that many well-written C programs are also valid C++ programs and can be compiled with a C++ compiler. The danger lies in C programs that are also valid C++ programs *but silently have different semantics*. It is therefore necessary to rigorously approach these languages as fully separate and distinct. – n. m. could be an AI Mar 07 '18 at 19:41

2 Answers2

15

"z1y2x3w4" is const char[9] and there is no implicit conversion from const char* to const signed char*.

You could use reinterpret_cast

const signed char * AnArrayOfStrings[]  = {reinterpret_cast<const signed char *>("z1y2x3w4"),
                                           reinterpret_cast<const signed char *>("Aname")};
Gaurav Sehgal
  • 7,422
  • 2
  • 18
  • 34
  • 13
    "`"z1y2x3w4"` is `const char *`" actually, it's a `const char[9]`. – Matteo Italia Mar 07 '18 at 06:10
  • What troubles me the most is that, whether a char is signed or unsigned, it still uses the same amount of space in memory which is the first thing the compiler needs to know. Whether the char is signed or unsigned depends on how the programmer wishes to interpret it and based on the intended interpretation, the compiler can decide if later statements are valid or not. The compiler should be "happy" that I'm telling it how I'm going to use/view the characters. I don't see that a "conversion" is required from pointer to char to pointer to signed or unsigned char, in all cases they are char. – ScienceAmateur Mar 07 '18 at 06:22
  • @ScienceAmateur `char`, `signed char` and `unsigned char` occupy the same amount of storage. – haccks Mar 07 '18 at 06:29
  • 4
    @ScienceAmateur In your you write "in all cases they are char". That's not correct. They are different types and represents different things. Read https://stackoverflow.com/a/436561/4386427 While C may silently allow such conversions that you do, C++ is more strict and require an explicit cast – Support Ukraine Mar 07 '18 at 06:29
  • @haccks: you are correct, they occupy the same amount of storage which is what I meant when I stated "uses the same amount of space in memory: – ScienceAmateur Mar 07 '18 at 06:36
  • @ScienceAmateur; Yes. Standard also says that *"Plain `char`, `signed char`, and `unsigned char` are three distinct types.*" – haccks Mar 07 '18 at 06:40
  • @4386427: I read a number of posts that referred to the "fundamental types" and how there are three types of char. I'm trying to find what could possibly be the reason to have a third type of char, "plain char", as stated in the documentation. It seems completely unnecessary and a source of problems. I'd like to know if there is a good reason for the compiler to consider groups of characters to be "magical" (that is, neither signed nor unsigned). Would specifying the sign cause problems that I am failing to see ? – ScienceAmateur Mar 07 '18 at 06:42
  • 3
    @ScienceAmateur Weel, a "signed char" may not use all bits. It's a historical thing (look up representation of signed number and trap representation). Anyway, I think this starts to be a moving question. The answer to the original question is "Because the types differ". The question in your comment above is more like "Why does the standard define three different char types". To me, that would be another and different answer. – Support Ukraine Mar 07 '18 at 07:05
  • @4386427: you are correct that inherently I am asking if there is a good reason for the standard to define three different char types. I presume there must have been a good reason and, I'd certainly like to know what it is. – ScienceAmateur Mar 07 '18 at 07:10
  • @Gaurav Sehgal: I marked your post as the answer because you provided a way to "convince" the compiler to accept the declaration. The assembly code generated is as expected, therefore it solves the problem. Thank you. – ScienceAmateur Mar 07 '18 at 07:53
  • 1
    @ScienceAmateur - Plain char isn't "a third type", it is the *first* type. Originally C only had `char`, and it happened to be signed (because that worked the best on the hardware where it was first implemented). When the C compiler was ported to new hardware, it was more efficient to make `char` unsigned. Who cares, it is only for text? Turned out some people had used `char` as a small integer, and they *did* care. So they were given `signed char` to be able to port their programs. Soon people using the newer machine needed `unsigned char` to move their programs in the other direction. – Bo Persson Mar 07 '18 at 10:41
3

If you compile the above code

const signed   char * AnArrayOfStrings2[] = {"z1y2x3w4", "Aname"};  

in C with gcc using options -Wall then it will give the following warning

test.c:5:49: warning: pointer targets in initialization differ in signedness [-Wpointer-sign]
  const unsigned   char * AnArrayOfStrings2[] = {"z1y2x3w4", "Aname"};
                                                 ^
test.c:5:49: note: (near initialization for 'AnArrayOfStrings2[0]')
test.c:5:61: warning: pointer targets in initialization differ in signedness [-Wpointer-sign]
  const unsigned   char * AnArrayOfStrings2[] = {"z1y2x3w4", "Aname"};  

The type of elements of AnArrayOfStrings2 and "z1y2x3w4" are different. AnArrayOfStrings2[0] is of type const signed char * while "z1y2x3w4" is of type const char[9].
The same code will raise error in C++. You will need explicit cast to make it work in C++.


To explain why

const char * AnArrayOfStrings[]  = {"z1y2x3w4", "Aname"}; 

works I will take s simple example

const char c[] = "asc";
const char *p1 = c;           // OK
signed const char *p2 = c;    // Error
unsigned const char *p3 = c;  // Error

In the second line of the above snippet, c will convert to const char * thus making p1 and c compatible types.
In third line the type of p2 and c are incompatible and compiler will raise an error in C++ (a warning in C). Same will happen with line 4.

If we take another example for int type

const int i[] = {1,2,3};
const int *ii = i            // OK
signed const int *si = i;    // OK
unsigned const int *usi = i; // Error  

First two pointer initializations work as int without any specifier is equivalent to signed int (but this is not true with char) and therefore types are compatible. Intialization fails in last case as const int * or signed const int * is incompatible with unsigned const int *.

haccks
  • 104,019
  • 25
  • 176
  • 264
  • if the sign is specified the compiler complains that signed (or unsigned) char * is not of type const char[9] yet is perfectly "happy" when the declaration is const char *. The fact that the programmer intends to interpret the characters as signed or unsigned does not change how the compiler allocates storage for it. It just seems completely arbitrary for the compiler to complain about the characters being interpreted as signed or unsigned. Imagine the compiler complaining about an array of signed or unsigned int and force the programmer to only have arrays of int. – ScienceAmateur Mar 07 '18 at 06:59
  • 2
    *Imagine the compiler complaining about an array of signed or unsigned int and force the programmer to only have arrays of int* -- except that `int` is actually the same as `signed int`. And it has got nothing to do with the compiler being able to figure out the space etc. It is the semantics of the `=` operator (initialization here) as defined by the standard that prevents this. – Ajay Brahmakshatriya Mar 07 '18 at 07:23
  • 1
    @ScienceAmateur; In fact compiler will complain about assigning any incompatible pointer type. `int *` and `signed int *` are compatible while these two are incompatible with `unsigned int *`. – haccks Mar 07 '18 at 07:38
  • 1
    @haccks: I just wanted to let you know that I very much appreciate your providing such a complete explanation along with examples. I gave the "answered" to Gaurav Sehgal because he provided a way to work around the problem but wanted to make it a point to express my appreciation for your work. Thank you. – ScienceAmateur Mar 07 '18 at 07:57