1

Since it's not clear what's undefined behaviour and what's not in C, I'm wondering if accessing an array element using a char is or not undefined behaviour. For example:

char c = 'A';
int a[3000];
printf("%i\n", a[c]);

I know that actually chars and ints are somehow interchangeable, but still, I'm not sure.

nbro
  • 15,395
  • 32
  • 113
  • 196
  • 1
    It depends on what the current encoded value of `'A'` is. If it's ASCII then the value is `65` which is a valid index, so then it's well-defined and well-behaved. – Some programmer dude Mar 04 '16 at 18:41
  • I've never tested this as it terrible practice, but if it's ASCII it should work. – Taelsin Mar 04 '16 at 18:43
  • Oh, and `int` and `char` are not really "interchangeable", the compiler can [*promote*](http://en.cppreference.com/w/c/language/conversion#Integer_promotions) a `char` to an `int` in many cases, or [*convert*](http://en.cppreference.com/w/c/language/conversion#Integer_conversions) an `int` to `char` in other occasions. – Some programmer dude Mar 04 '16 at 18:44
  • a[c] is converted to *(&a+c), and in your example c is 65, so you would get the 66th, a[0] being the 1st, member of the array. However in your example the array is uninitialized, so you would get a random integer from the stack. – Arif Burhan Mar 04 '16 at 18:47
  • So using a `char` to index an array is not undefined behaviour, but accessing the uninitialized array at all is. – Rudy Velthuis Mar 04 '16 at 18:55
  • @RudyVelthuis Why do you say that? The content of the unintialised array would be garbage, but why should that be undefined behaviour? – nbro Mar 04 '16 at 18:57
  • @ArifBurhan No, `a[c]` is not converted to `*(&a + c)`. Using `&a` is [definitely not correct](http://stackoverflow.com/questions/35771298/pointer-and-address-to-that-pointer-lead-to-the-same-thing/35771465#35771465). The expression `a[c]` is *equivalent* to `*(a + c)` (note lack of ampersand). – Some programmer dude Mar 04 '16 at 18:58
  • 1
    @nbro Unless the type was `unsigned char` or equivalent, accessing uninitialized data like `int a[3000]` can set off a trap representation. It is UB per spec. – chux - Reinstate Monica Mar 04 '16 at 19:01
  • @nbro: I'd say that accessing uninitialized variables causes undefined behaviour. – Rudy Velthuis Mar 04 '16 at 19:02
  • @RudyVelthuis What if I had initialized it as `int a[3000] = {0}`, in that case it would be defined behaviour accessing an element because all elements would be zero initialized, right? – nbro Mar 04 '16 at 19:05
  • @nbro: I guess this is a trick question, but yes, I would say it is initialized. – Rudy Velthuis Mar 04 '16 at 19:07

6 Answers6

1

Syntactically, a[c] is a valid expression as long as c is an integer type or can be promoted to an integer type.

From the C99 Standard:

6.5.2.1 Array subscripting

1 One of the expressions shall have type ‘‘pointer to object type’’, the other expression shall have integer type, and the result has type ‘‘type’’.

If the value of c. after is promoted to an int, is within the bounds of the array, then there should be no problem at run time.

Community
  • 1
  • 1
R Sahu
  • 204,454
  • 14
  • 159
  • 270
1

Is accessing an array element using a char undefined behaviour?

It is not undefined behavior. It works like another integer type. Yet the numeric value of a char may surprisingly be negative.


A char has the same range as signed char or an unsigned char. It is implementation defined.

Using c as an index is fine, if the promoted index plus the pointer results in a valid memory address. Detail: A char will be promoted to int, or possible unsigned.

The following is potentially a problem had c had a negative value. In OP's case, with ASCII encoding, 'A' has the value of 65, so it does not have a problem as 0 <= 65 < 3000. @Joachim Pileborg

char c = 'A';
int a[3000] = { 0 };
printf("%i\n", a[c]);  // OK other than a[] not initialize in OP's code.
Community
  • 1
  • 1
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • **char** and **unsigned char** can hold 0…255 , **signed char** can only hold -128…127 . The binary digits are the same, but assigning to other types or casting is done differently. Even arithmetic has to be considered carefully, as there is no overflow or underflow warning or correction, and this can happen easily with only 8 bits. u ± s where u is unsigned and s is signed is clearly defined, just follow the binary, but with surprises for the unwary. – Arif Burhan Mar 04 '16 at 19:10
  • 2
    @Arif Burhan `char, unsigned char, singed char` could have the ranges you mentioned. The C spec does **not agree** that they must have those ranges. Don't agree? trying posting a question on that. The rest of your comment is unclear on how it applies to this answer as it lacks context. – chux - Reinstate Monica Mar 04 '16 at 19:16
  • @Arif Burhan `char` has the range `[CHAR_MIN ... CHAR_MAX]`, `unsigned char` has the range `[0 ... UCHAR_MAX]`, `signed char` has the range `[SCHAR_MIN ... SCHAR_MAX]`, The _minimum_ value for the `SCHAR_MAX` is 127. The _minimum_ value for the `UCHAR_MAX` is 255. `UCHAR_MAX` could be 65,535 or `pow(2,32)-1` or others. It must be a `power-of-2 - 1`. – chux - Reinstate Monica Mar 04 '16 at 19:24
  • @Arif Burhan Concerning "Even arithmetic has to be considered carefully, as there is no overflow or underflow warning or correction, ". Unsigned math does have a "correction" in that the result of any operation is well defined, the answer being the modulo of "the max value of the type + 1" – chux - Reinstate Monica Mar 04 '16 at 19:28
  • if a *signed char* overflows by a small amount it becomes small and negative, ok for chars, but possibly fatal for *signed short* possibly stored in memory as such, which is later used as an array index or memory or structure offset. @chux – Arif Burhan Mar 04 '16 at 19:44
  • @Arif Perhaps you are thinking of anther language? In C, narrow operands (`char`, `short`, etc.) are promoted to `int/unsigned` first and then the operation occurs. Adding any 2 characters in the range you described will **never** overflow. – chux - Reinstate Monica Mar 04 '16 at 20:06
  • @ArifBurhan a single `signed char` or `signed short` cannot _overflow_. Where is this _overflow_ that you are commenting on - in some other code? – chux - Reinstate Monica Mar 04 '16 at 20:10
0

From all I know I'd say it's not undefined, but rather well defined. The reason: A char may be promoted to an integer, which is a valid way to index an array (or better said: pointer, which the array decays into in that expression). Indexing is basically the same as addition:

pointer + index // same as &(pointer[index]) or &(index[pointer])

And, quoting http://en.cppreference.com/w/cpp/language/implicit_cast (under "Numeric promotions"):

[..] Prvalues of small integral types (such as char) may be converted to prvalues of larger integral types (such as int). In particular, arithmetic operators do not accept types smaller than int as arguments, [..]

AFAIK compilers will emit a warning, though, because usually you don't use a char as index, thus the compiler tries to provide an extra net of safety.

Daniel Jour
  • 15,896
  • 2
  • 36
  • 63
0

It'll mostly work, but be careful about non-ASCII chars, with value > 127

If the char is signed, it'll get promoted to a negative integer, causing access to memory outside of the array!

This is a common bug in naïve implementations of e.g. tolower()

Alnitak
  • 334,560
  • 70
  • 407
  • 495
0

This should automatically cast to int and go to that element of the array, so the behavior is not undefined. However, there is never really a reason to do this. Even if you start at ' ' (ASCII decimal value 32) you aren't using the other 32 values before it.

I think you are probably trying to make a very basic hash table. This can easily be done with a struct and a few functions; it is usually bad practice to use anything but an integer type (even though a char can be casted to int) as an array subscript.

Jacob H
  • 864
  • 1
  • 10
  • 25
0

The short answer is: the code fragment does not compile.

The intermediary answer is: if part of a function definition, the code has undefined behavior because it accesses an uninitialized object.

The long answer is: with a properly initialized array, it still depends:

  • c in the expression a[c] will be promoted to int prior to computing the array index, and the C Standard mandates that 'A' have a positive value, regardless of whether type char is signed or unsigned. If the type char has 8 bits, the behavior would not be undefined, but implementation defined as the actual value of 'A' depends on the target architecture.

  • If the char type is larger than 11 bits, it would be possible for the value 'A' to exceed 3000 and thus for the expression to attempt an access beyond the end of the array, which has undefined behavior.

chqrlie
  • 131,814
  • 10
  • 121
  • 189