2

I was writing production-ready C where I need to find the frequency of characters in a char array pretty fast. I was trying remove an assert call to check for positive values during a cast. Is my assert redundant code or is it necessary?

    char input[] = "Hello World";
    int inputLength = sizeof(input)/ sizeof(char);
    int *frequencies = calloc(256, sizeof(int));
    for(int i = 0; i < inputLength-1; i++)
    {
        int value = (int) input[i];
        assert(value > -1);//Is this line redundant?
        frequencies[value] += 1;
    }
    printf("(%d)", inputLength);
    PrintFrequencies(frequencies);
    free(frequencies);
murage kibicho
  • 546
  • 5
  • 14
  • 2
    The cast isn't necessary, and if the `char` is signed, you need the assertion. – Weather Vane Jun 09 '23 at 12:53
  • 1
    Please note that `sizeof(char)` is specified to *always* be equal to `1`. Also note that the size of the array `input` will include the null-terminator. And that the result of the `sizeof` operator is a value of type `size_t`, for which you need the `%zu` format to print. – Some programmer dude Jun 09 '23 at 12:54
  • 1
    Perhaps you needed `int value = (unsigned char) input[i];` and then no assertion. – Weather Vane Jun 09 '23 at 12:55
  • 2
    And for a small number like `256`, why not simply create a plain array? Like `unsigned frequencies[256] = { 0 };`? Note that I use the `unsigned` type since you can't have negative values in the array. – Some programmer dude Jun 09 '23 at 12:56
  • 1
    @WeatherVane Using a run-time `assert` that disappears entirely if compiled with `NDEBUG` defined is a horrible way to address this. Whether `char` is signed or not is known at compile time. Heck, just use `unsigned char`. – Andrew Henle Jun 09 '23 at 12:57
  • 1
    In your specific string, `"Hello World"`, all characters are non-negative, because any member of C’s basic character set has a nonnegative value, per C 2018 6.2.5 3. Other characters may have negative values. You can deal with that by casting an individual character value to `unsigned char`. Also, the C standard allows there to be more than 256 possible values. You can accommodate that by changing the array size to `UCHAR_MAX+1`. – Eric Postpischil Jun 09 '23 at 13:01
  • 1
    @EricPostpischil: Although there is the possibility of a machine having a 64 bit `char` type, such as the one I played with at University. Using `uint8_t` is probably the most sensible thing to do, although the machine I'm talking about probably wouldn't support it. – Bathsheba Jun 09 '23 at 14:18
  • @Bathsheba: Using `unsigned char` and sizing the array with `UCHAR_MAX+1` gives a simple fully portable solution. Why bother making a solution that is technically less portable, even if rarely so? – Eric Postpischil Jun 09 '23 at 17:17

3 Answers3

6

Does casting from Char to Int always give positive values in C

Generally speaking, no. char may be either a signed or an unsigned type, at the C implementation's discretion, but pretty frequently it is a signed type.

All char values representing members of the basic execution character set are guaranteed to be non-negative. This includes the upper- and lowercase Latin letters, the decimal digits, a variety of punctuation, the space character and a few control characters. The char values representing other characters may be negative, however. Also, the multiple char values constituting the representation of a multi-byte character can include some that, considered as individual chars, are negative.

I was writing production-ready C where I need to find the frequency of characters in a char array pretty fast. I was trying remove an assert call to check for positive values during a cast. Is my assert redundant code or is it necessary?

Your assert() is semantically wrong. If you're reading arbitrary text and you want your program to be robust, then you do need to be prepared for chars with negative values. But

  1. assertions are the wrong tool for this job. Assertions are for checking that the invariants your program assumes in fact hold. You might use an assertion if you (thought you) had a guarantee that char values were always non-negative, for example. If an assertion ever fails, it means your code is wrong.

    You must never use an assertion to validate input data or perform any other test that your program relies upon being performed, because depending on how you compile the program, the asserted expression might not be evaluated at all.

  2. It would be better for your program to handle negative char values if they are encountered than to fail. In this regard, do note that there's no particular use in converting your char explicitly to int. You can use a char directly anywhere where you want an integer. On the other hand, it might make sense to cast to unsigned char, as that will be cheap -- possibly free, even if char is signed -- and it will take care of your signedness problem.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
4

Formally, there is no requirement that the values of the character sets supports by a standard C compiler have any particular value, positive or negative.

The char type can either by signed or unsigned: Is char signed or unsigned by default?. In situations where it is signed and some "extended character set" is implemented (beyond classic "7 bit ASCII" for example), then strings could in theory hold negative values.

So depending on how portable you need to code to be, there might be a place for the assert. However as mentioned in comments, casting to an unsigned type instead removes the problem. Consider using this instead:

uint8_t value = input[i];

Now value is guaranteed to be in the range of 0 - 255 and the code is portable.

Lundin
  • 195,001
  • 40
  • 254
  • 396
1

For starters this statement

int inputLength = sizeof(input)/ sizeof(char);

is redundant. As the array contains a string when within the for loop you can just check whether a current character is the terminating zero character '\0'.

Also pay attention to that in general a character array can be much larger than the string stored in the array. So using this variable calculated such a way in general can be wrong. It would be more correctly to use standard C function strlen if the length of a stored string is required. . Using this casting

int value = (int) input[i];

is actually is equivalent to

int value = input[i];

due to the integer promotion.

The type char can behave either as the signed type signed char or as the unsigned type unsigned char. To make the code independent on the behavior of the type char you need to cast each character to the type unsigned char.

So this assert

assert(value > -1);

does not useful.

An array with 256 elements of the type int is not very large. So you could define it with automatic storage duration. Also it will be logically consistent to use the type unsigned int instead of the signed type int in the array declaration.

Your code snippet can look like

char input[] = "Hello World";

unsigned int frequencies[256] = { 0 };

for ( const char *p = input; *p != '\0'; ++p )
{
    ++frequencies[( unsigned char )*p];
}

PrintFrequencies(frequencies);
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335