3

I have a question, I have read some post here in SO that asks for help about when using char, when using signed char or unsigned char; in that post, they answered that to store characters we must use char and for using small data use signed/unsigned char, but, for what I know, char is implementation defined, so it can be equal to signed char or unsigned char.

The question is, can I use char or signed char or unsigned char to store characters? If the answer is "No you cant", my next question will be, why? can you explain me the reason for use strictly char (plain char) to store characters?

Thanks in advance!!

chqrlie
  • 131,814
  • 10
  • 121
  • 189
Cblue X
  • 143
  • 6
  • 1
    If you're working with plain text, you probably want to use plain `char`, because most of C's string-related functions are defined to work with `char` or `char *`. If you're working with raw bytes, and especially if you're doing some math to combine adjacent bytes into multibyte integers, it's often extremely useful to declare everything as `unsigned char` or `unsigned char *`, because you avoid lots of annoying issues having to do with sign extension. – Steve Summit Dec 12 '22 at 05:02
  • Use `char` for characters and thereby strings. Use `unsigned char` for low level raw data. Especially when using bitwise operators `unsigned char` should be used. Use `signed char` in (the seldom) case where you want signed calculations on a small data type. – Support Ukraine Dec 12 '22 at 07:16
  • You got some great answers below. Make sure you to accept the best one, or comment on the answers that you feel fall short. – Allan Wind Dec 20 '22 at 06:27

3 Answers3

1

Is there any difference between using char (plain char) or signed/unsigned char to store characters in C?

Yes, yet ...

Use of >>, *, /, % creates differences when the character object is some signed character and negative vs. the unsigned character same bit-pattern positive.

Assignment to int can have an unexpected sign extension with negative characters.

is...() invokes undefined behavior (UB) when the character argument is negative (and not EOF).

The cases for _Generic distinguish between char, signed char, unsigned char.

Pedantic: With archaic non-2's complement using signed characters, user code often does not properly distinguish between +0 and -0 for the null character.

Signed types have more implementation defined behaviors than unsigned types when converting to other types, thus reducing portability.

"signed char are at a disadvantage with respect to working with UTF-8 encoded text".

... Other issues too.

... not always

str...() behave as if the character was unsigned char regardless if char is signed or unsigned. This is important in some functions, like strcmp() as when the difference in a string involves a negative char.

"%c", "%s" in *scanf(), *printf() matches all 3 types (or pointers to them).

There are no padding bits with the 3 character types and they consume the same space, although the soon to be dropped non-2's complement encodings allow for a trap representation for signed character types.


... can i use char or signed char or unsigned char to store characters?

Yes.

The string manipulation, char has the advantage in matching str..() function signatures.

For logic and raw byte code, use unsigned char.

Use signed char when small signed values are needed.

@Steve Summit, @Support Ukraine


If the answer is "No you cant", my next question will be, why? can you explain me the reason for use strictly char (plain char) to store characters?

N/A

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
1

char is guaranteed to always be large enough to store what's called the basic character set, which are the basic latin letters and symbols used in English. For example 7-bit ASCII. So as long as you use to store text only, you don't need to worry about its since and signedness.

Problems only appear when you use char for storing raw data, or when you use it in arithmetic. In that case, signedness might matter and since we can't portably know the signedness of char, it is unsuitable for such purposes. Is char signed or unsigned by default?

The best type to use when dealing with raw data bytes is unsigned char/uint8_t. Similarly, use this when doing unsigned arithmetic on 8 bit types.

signed char/int8_t is pretty much only used when doing signed arithmetic on very resource-constrained systems such as 8 bit CPUs.

The reason why unsigned char isn't used universally is historical. C actually treats char, unsigned char and signed char as 3 different types. For example unsigned char x; char*y = &x; isn't valid, we have to make an explicit cast in order for it to work. However, on the binary level all character types alias with each other. So if we were to pass a unsigned char array to for example strcpy, it will work just fine but we need to cast the argument to char*, which is a bit tedious. Better then to keep all text in char type and avoid such casting.

(Theoretically the character types can be larger than 8 bits and then the int8_t/uint8_t types aren't present. But this exotic scenario is only relevant when writing C for certain DSP systems.)

Lundin
  • 195,001
  • 40
  • 254
  • 396
0

It's implementation defined if char is signed or unsigned as you said.

If you need a particular version specify it. ASCII is 7 bits so for that it doesn't matter. If you need 8 bit or more (like UTF-8) it depends on how you use the data. For example left shift is well defined for unsigned values, but implementation defined if the left operand is negative. if(ch < 0) is a no-op for unsigned but may matter greatly for signed.

In the upcoming C 2023 standard the new type char8_t may be of interest to you. It is unsigned and the same as unsigned char.

Allan Wind
  • 23,068
  • 5
  • 28
  • 38