If you’re going to be widening a lot of 8-bit values to wchar_t
, unsigned char
is what you want, because that’s a no-op rather than a bitmask. If your char
format is UTF-8, you also want to be able to use unsigned math for your shifts. If you’re using library functions, it’s most convenient to use the types your library expects.
The RISC-V architecture has both a LB
instruction that loads a sign-extended 8-bit value into a register, and a LBU
instruction that zero-extends. Both are equally efficient. In C, any signed char
used in an arithmetic operation is widened to int
, and the C standard library functions specify widening char
to int
, so this puts the variable in the correct format to use.
Storing is a matter of truncation, and converting from any integral type to unsigned char
is trivial (bitmask by 0xff
). Converting from an unsigned char
to a signed value can be done in no more than two instructions, without conditionals or register pressure (SLLI
to put the sign bit of the char
into the sign bit of the machine register, followed by SRLI
to sign-extend the upper bits).
There is therefore no additional overhead in this architecture to working with either. The API specifies sign-extension rather than zero-extension of signed quantities.
Incidentally, RV64I does not architecturally prefer sign-extension. That is the ABI convention, but the instruction set adds a LWU
instruction to load a 32-bit value from memory with zero-extension and an ADDIW
that can sign-extend a zero-extended 32-bit result. (There is no corresponding ADDIB
for 8-bit or ADDIH
for 16-bit quantities.)