1

I have the following code:

int some_array[256] = { ... };

int do_stuff(const char* str)
{
   int index = *str;
   return some_array[index];
}

Apparently the above code causes a bug in some platforms, because *str can in fact be negative.

So I thought of two possible solutions:

  1. Casting the value on assignment (unsigned int index = (unsigned char)*str;).

  2. Passing const unsigned char* instead.

Edit: The rest of this question did not get a treatment, so I moved it to a new thread.

Rama
  • 3,222
  • 2
  • 11
  • 26
Elad Weiss
  • 3,662
  • 3
  • 22
  • 50
  • 6
    Why you are passing index as string? What are you trying to achieve? – qrdl Jan 24 '17 at 10:45
  • 2
    Really? Why do you want to pass a `char *` to a function, and use what it points at as an index into a static array? – Peter Jan 24 '17 at 10:45
  • 1
    ".. *str can in fact be negative" should not be a bug. If the calling code does not handle it correctly, the bug is there, not in this snippet. – Jongware Jan 24 '17 at 10:47
  • @Raw N - even a technique that is slightly decent would be an improvement. – Peter Jan 24 '17 at 10:47
  • @qrdl I try to reduce the question to minimum. The actual code does a deep packet inspection on uses the 1st payload byte as a key to a fixed 256 bytes dictionary. – Elad Weiss Jan 24 '17 at 10:47
  • @qrdl: a `char` _could_ hold a negative value, if said `char` was used to store a byte. The C99 standard mentions this as implementation-defined behaviour: _"An object declared as type char is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type."_ – Elias Van Ootegem Jan 24 '17 at 10:47
  • 1
    @qrdl: Lookup tables for string operations are a very important implementation technique. – Kerrek SB Jan 24 '17 at 10:49

2 Answers2

5

The signedness of char is indeed platform-dependent, but what you do know is that there are as many values of char as there are of unsigned char, and the conversion is injective. So you can absolutely cast the value to associate a lookup index with each character:

unsigned char idx = *str;
return arr[idx];

You should of course make sure that the arr has at least UCHAR_MAX + 1 elements. (This may cause hilarious edge cases when sizeof(unsigned long long int) == 1, which is fortunately rare.)

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • Actually, the protocol uses octets, so you only need to make sure that the array has 256 elements. It doesn't matter if `UCHAR_MAX>255` because no possible input can contain such values. – MSalters Jan 24 '17 at 11:01
  • @MSalters: Yes, I was going to say that - if you have information on the expected input, you can restrict the lookup table to that (and assert). – Kerrek SB Jan 24 '17 at 11:21
4

Characters are allowed to be signed or unsigned, depending on the platform. An assumption of unsigned range is what causes your bug.

Your do_stuff code does not treat const char* as a string representation. It uses it as a sequence of byte-sized indexes into a look-up table. Therefore, there is nothing wrong with forcing unsigned char type on the characters of your string inside do_stuff (i.e. use your solution #1). This keeps re-interpretation of char as an index localized to the implementation of do_stuff function.

Of course, this assumes that other parts of your code do treat str as a C string.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523