54

When I am running this program I am getting warning "array subscript has type 'char'". Please help me where is it going wrong. I am using code::blocks IDE

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string.h>
void NoFive()
{
    long long int cal;
    char alpha[25];
    char given[100] = "the quick brown fox jumped over the cow";
    int num[25];
    int i, k;
    char j;
    j = 'a';
    k = 26;
    cal = 1;
    for(i = 0; i <= 25; i++)
    {
        alpha[i] = j++;
        num[i] = k--;
      //  printf("%c = %d \n", alpha[i], num[i]);
    }
    for(i = 0; i <= (strlen(given) - 1); i++)
    {
        for(j = 0; j <= 25; j++)
        {
         if(given[i] == alpha[j]) ***//Warning array subscript has type char***
         {
            cal = cal * num [j]; ***//Warning array subscript has type char***
         }
         else
         {

         }
        }
    }
printf(" The value of cal is %I64u ", cal);
}

main()
{
NoFive();
}
Rasmi Ranjan Nayak
  • 11,510
  • 29
  • 82
  • 122
  • 4
    http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html Will shed some light on why this is a warning. – ta.speot.is Apr 02 '12 at 07:27
  • `for(i = 0; i <= 25; i++)` is also wrong (twice). Should be `for(i = 0; i < 25; i++) {...}` The array has 25 elements. And `for(i = 0; i <= (strlen(given) - 1); i++)` is debatable. – wildplasser Apr 26 '12 at 18:15
  • @ta.speot.is unfortunately the GCC documentation does not shed _any_ light on the _why_. It does not even try to explain the situation. – Roland Illig Mar 15 '20 at 18:43
  • 1
    @RolandIllig it says *Warn if an array subscript has type char. This is a common cause of error, as programmers often forget that this type is signed on some machines. This warning is enabled by -Wall.* Why would you want a negative subscript? – ta.speot.is Mar 16 '20 at 01:00
  • @ta.speot.is I don't _want_ a negative subscript, I get it implicitly without doing anything about it. That's the problem. – Roland Illig Mar 16 '20 at 03:25
  • @RolandIllig But that's what it says in the link ... it tells you why it's giving you the warning. cf. *unfortunately the GCC documentation does not shed any light on the why. It does not even try to explain the situation.* – ta.speot.is Mar 16 '20 at 07:55
  • To help with this warning in the future, I have asked that the [GCC documentation explains this problem in more detail](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94182). – Roland Illig Mar 24 '20 at 05:12

3 Answers3

87

Simple, change

char j;

to

unsigned char j;

or to just a plain (u)int

unsigned int j;
int j;

From GCC Warnings

-Wchar-subscripts Warn if an array subscript has type char. This is a common cause of error, as programmers often forget that this type is signed on some machines. This warning is enabled by -Wall.

The compiler doesn't want you to inadvertantly specify a negative array index. And hence the warning!

Pavan Manjunath
  • 27,404
  • 12
  • 99
  • 125
  • 11
    Using an array index of type `int` does **not** lead to any warning, although it also would allow negative indexes ... @Pavan Manjunath – alk Apr 26 '12 at 17:04
  • @alk Ahh. It was a typo. I meant `unsigned char` than just `unsigned`. Anyways, I was just making the point of negative indexes. Nevertheless I edited my post to be clear to future visitors :) – Pavan Manjunath Apr 26 '12 at 18:03
  • 1
    @alk: A couple of differences between `int` and `char`: (1) There aren't any compilers (at least none that wouldn't be considered even remotely "normal") where `int` might reasonably be expected to be unsigned; (2) Code which uses type `char` as an array subscript is more likely than code which uses type `int`, to assume that all character literals, or all the characters in string literals, represent positive values. I'm not certain if all characters in the "C character set" are required to be positive, but I know characters outside that set are not. – supercat Apr 26 '12 at 18:14
  • I have got this warning with the following code: context->ptr[0] = (char)toupper(c); where "ptr" is of type "char *". Is the compiler thinking that 0 is a signed char and hence might be negative? – AlastairG Nov 30 '13 at 14:43
  • 1
    Simply changing the type from `char` to `int` or `unsigned int` is wrong. Read any good manual about the `` function to learn the details. – Roland Illig Feb 07 '21 at 21:44
  • This doesn't even answer the question (asking for help knowing where the error is ocurring), and it almost implies there's something wrong with negative array indices. Going either forward or backward from a mid-point in an array is perfectly reasonable. If the compiler were worried about inadvertent negative values, it would produce warnings for `int`s, `short`s, etc., but most do not. Roland's answer is more accurate. – SO_fix_the_vote_sorting_bug Feb 03 '23 at 23:52
19

This is a typical case where GCC uses overly bureaucratic and indirect wording in its diagnostics, which makes it difficult to understand the real issue behind this useful warning.

// Bad code example
int demo(char ch, int *data) {
    return data[ch];
}

The root problem is that the C programming language defines several data types for "characters":

  • char can hold a "character from the basic execution character set" (which includes at least A-Z, a-z, 0-9 and several punctuation characters).
  • unsigned char can hold values from at least the range 0 to 255.
  • signed char can hold values from at least the range -127 to 127.

The C standard defines that the type char behaves in the same way as either signed char or unsigned char. Which of these types is actually chosen depends on the compiler and the operating system and must be documented by them.

When an element of an array is accessed by the arr[index] expression, GCC calls the index a subscript. In most situations, this array index is an unsigned integer. This is common programming style, and languages like Java or Go throw an exception if the array index is negative.

In C, out-of-bounds array indices are simply defined as invoking undefined behavior. The compiler cannot reject negative array indices in all cases since the following code is perfectly valid:

const char *hello = "hello, world";
const char *world = hello + 7;
char comma = world[-2];   // negative array index

There is one place in the C standard library that is difficult to use correctly, and that is the character classification functions from the header <ctype.h>, such as isspace. The expression isspace(ch) looks as if it would take a character as its argument:

isspace(' ');
isspace('!');
isspace('ä');

The first two cases are ok since the space and the exclamation mark come from the basic execution character set and are thus defined to be represented the same, no matter whether the compiler defines char as signed or as unsigned.

But the last case, the umlaut 'ä', is different. It typically lies outside the basic execution character set. In the character encoding ISO 8859-1, which was popular in the 1990s, the character 'ä' is represented like this:

unsigned char auml_unsigned = 'ä';   // == 228
signed   char auml_signed   = 'ä';   // == -28

Now imagine that the isspace function is implemented using an array:

static const int isspace_table[256] = {
    0, 0, 0, 0, 0, 0, 0, 0,
    1, 1, 1, 0, 0, 1, 0, 0,
    // and so on
};

int isspace(int ch)
{
    return isspace_table[ch];
}

This implementation technique is typical.

Getting back to the call isspace('ä'), assuming that the compiler has defined char to be signed char and that the encoding is ISO 8859-1. When the function is called, the value of the character is -28, and this value is converted to an int, preserving the value.

This results in the expression isspace_table[-28], which accesses the table outside the bounds of the array. This invokes undefined behavior.

It is exactly this scenario that is described by the compiler warning.

The correct way to call the functions from the <ctype.h> header is either:

// Correct example: reading bytes from a file
int ch;
while ((ch = getchar()) != EOF) {
    isspace(ch);
}

// Correct example: checking the bytes of a string
const char *str = "hello, Ümläute";
for (size_t i = 0; str[i] != '\0'; i++) {
    isspace((unsigned char) str[i]);
}

There are also several ways that look very similar but are wrong.

// WRONG example: checking the bytes of a string
for (size_t i = 0; str[i] != '\0'; i++) {
    isspace(str[i]);   // WRONG: the cast to unsigned char is missing
}

// WRONG example: checking the bytes of a string
for (size_t i = 0; str[i] != '\0'; i++) {
    isspace((int) str[i]);   // WRONG: the cast must be to unsigned char
}

The above examples convert the character value -28 directly to the int value -28, thereby leading to a negative array index.

// WRONG example: checking the bytes of a string
for (size_t i = 0; str[i] != '\0'; i++) {
    isspace((unsigned int) str[i]);   // WRONG: the cast must be to unsigned char
}

This example converts the character value -28 directly to unsigned int. Assuming a 32-bit platform with the usual two's complement integer representation, the value -28 is converted by repeatedly adding 2^32 until the value is in the range of unsigned int. In this case this results in the array index 4_294_967_268, which is much too large.

Roland Illig
  • 40,703
  • 10
  • 88
  • 121
  • 3
    "_the type `char` is equivalent to either `signed char` or to `unsigned char`_": `char` must behave as and have the same representation as either `signed char` or `unsigned char`, but `char`, `signed char`, and `unsigned char` are three distinct types in C. "_negative array indices are simply defined as invoking undefined behavior._": array indexing with negative values is perfectly well-defined in C since `arr[n]` is equivalent to `*(arr + n)`. One way this _can_ lead to undefined behavior is if the pointer arithmetic leads to an out-of-bounds access. – ad absurdum Mar 15 '20 at 19:04
  • 2
    [Reported as a GCC bug](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94182) – Roland Illig May 17 '20 at 19:57
  • I think that blindly casting the parameters to `unsigned char` might introduce regressions if the remaining code relies on the respective ctype function to check for EOF. At least in some hypothetical cases, depending on the value resulting from the cast, EOF might be erroneously deemed as member of one the classes by the ctype functions. You are elegantly avoiding this possibility in your examples but I think it's a possible pitfall worth mentioning. Also, mentioning macro implementations of the ctype functions would make it clear why the compiler actually warns with this specific warning(?) – stefanct Nov 22 '21 at 14:39
0

Note that Roland Illig's explanation is slightly incomplete; these days, 'ä' might not even compile (or it might compile to something that doesn't fit in a byte, but that's very implementation-dependent or possibly even UB). If you're using UTF-8, then "ä" is the same as "\xc3\xa4".