What is the difference between char and unsigned char in this situation?

Question

This is a simple clone for the memchr function; the thing is that it works with char as well as with unsigned char.

My question is why the man says it should be unsigned char?

void    *ft_memchr(const void *s, int c, size_t n)
{
    size_t          i;
    unsigned char   *src;

    src = (unsigned char *)s;
    i = 0;
    while (i < n)
    {
        if (src[i] == (unsigned char)c)
            return (src + i);
        i++;
    }
    return (0);
}

What is the difference between that code and this one:

void    *ft_memchr(const void *s, int c, size_t n)
{
    size_t          i;
    char    *src;

    src = (char *)s;
    i = 0;
    while (i < n)
    {
        if (src[i] == (char)c)
            return (src + i);
        i++;
    }
    return (0);
}

They may be the same: [Is char signed or unsigned by default?](https://stackoverflow.com/a/2054941) — 001, Nov 25 '21 at 17:24
Note that for almost all situations, a value of any type smaller than `int` will be [*promoted*](https://en.cppreference.com/w/c/language/conversion#Integer_promotions) to `int`. This promotion (which happens with e.g. `src[i] == (char)c`) will also do *sign extension*. That can lead to problem when comparing a `char` value with an actual `int` value. — Some programmer dude, Nov 25 '21 at 17:30
Using signed char on a sign-magnitude system would confuse negative zero and positive zero. — Raymond Chen, Nov 25 '21 at 17:32
They may or may not be the same, but a version of `ft_memchr` that uses `signed char` will probably be different. — Ian Abbott, Nov 25 '21 at 17:32
thanks for the answer but can you please provide me with an example where using char can lead us to problem exactly ? — Abdellah Maarifa, Nov 25 '21 at 17:37
@NateEldredge what will happen? Can you share your considerations? — 0___________, Nov 25 '21 at 17:55
@NateEldredge just test it they behave both in the same way! i see no difference! — Abdellah Maarifa, Nov 25 '21 at 17:58
@AbdellahMaarifa There is no difference. The only possible difference I can imagine is wrong handling of zero on systems (none of the modern architectures does it) that support negative zero (you will probably add an additional case for it both to be treated as one zero) — 0___________, Nov 25 '21 at 17:59
The typical example where using `char` could lead to problems is when reading from input character by character, and comparing against the `int` value `EOF`. If `char` is unsigned then you compare `-1` against `255` which will not be equal. That's the reason all character-reading functions returns an `int` value. Unfortunately many beginners don't think about that and assign the result to a `char` variable. — Some programmer dude, Nov 25 '21 at 18:01
@Someprogrammerdude your example is a completely different case not related to this one. We are not comparing back to `int`. — 0___________, Nov 25 '21 at 18:04
@Someprogrammerdude i don't get it, but what i can see is that -1 and 255 are the same even when we casted to unsigned char [ unsigned char a = (unsigned char)-1; unsigned char b = 255; ] — Abdellah Maarifa, Nov 25 '21 at 18:14
My mistake, I got one of the cases wrong. But the issue that remains is in the second version, if you pass `c == 255`, then the result of the conversion `(char)255` is implementation-defined. On most systems, it will be `-1`, but that is not guaranteed by the C standard. On the other hand, `(unsigned char)-1 == 255` is guaranteed, provided that `unsigned char` is 8 bits. So by using `unsigned char` we get reliable behavior everywhere. — Nate Eldredge, Nov 25 '21 at 18:57
@NateEldredge yeah i can see what you talking about but can you please explain to me why this happen in most cases i mean, why (char)-1 became 255 most time — Abdellah Maarifa, Nov 25 '21 at 19:20
Because that's the behavior that most implementations choose to define. It's natural because on a [two's-complement](https://en.wikipedia.org/wiki/Two%27s_complement) machine, the bits representing the signed number -1 are the same bits as those representing the unsigned number 255 (namely all 1 bits), so this is the simplest conversion to do - just use the same bits and interpret them as the new type. — Nate Eldredge, Nov 25 '21 at 19:22

score 2 · Answer 1 · answered Nov 26 '21 at 00:29

When you use unsigned char, the behavior is fully defined by the C standard:

C 2018 6.2.6.1 3 says unsigned char objects shall be represented with pure binary. A footnote makes it clear that all the bits of an unsigned char participate in this, so it has no padding bits.
C 2018 6.2.6.1 4 guarantees we can work with the bytes representing any object using unsigned char.
C 2018 6.2.6.1 1 and 2 allow integer types other than unsigned char to have padding bits.

For practical purposes, modern C implementations have largely eliminated the shortcomings this leaves in char and signed char types. But, in theory, they can have padding bits (and all the bits in the memory accessed as a char might not contribute to its value, so writing the same char value to another memory location might not reproduce all the bits) and they can have multiple representations (bit patterns) that represent the same value (two representations for zero, one with a positive sign bit and one with a negative sign bit). (Integer types generally can also have trap representations, but C 2018 6.2.6.1 5 does not allow for these to have effects with character types.) C 2018 6.2.6.2 3 says a “negative zero” is not necessarily preserved when it is stored in an object; it may become a “normal zero.”

So, in theory, when src[i] is a char, it might have a value equal to c even though the bits in the memory of src[i] differ from the bits of c, and therefore the memchr routine would return an incorrect result.

You are unlikely to encounter such a C implementation in practice, but the end result is that the C standard guarantees the behavior with unsigned char and does not with char.

What is the difference between char and unsigned char in this situation?

1 Answers1