2

On 64-bit RISC-V, when a 32-bit operand is loaded into a register, it is necessary to decide whether to sign-extend or zero-extend to 64 bits, and the architectural decision was made to prefer the former, presumably on the grounds that the most common int type in C family languages is a signed 32-bit integer. So sign extension is slightly faster than zero extension.

Is the same true of 8-bit operands? In other words, is signed char more efficient than unsigned char?

rwallace
  • 31,405
  • 40
  • 123
  • 242
  • 2
    What do you mean by more efficient? Do you mean the sign extension of 8-bit to a higher bit representation or just more general efficiencies between signed and unsigned chars. For the latter I found this: https://stackoverflow.com/questions/4712315/performance-of-unsigned-vs-signed-integers – FeelTheBurns Jul 31 '19 at 15:14

1 Answers1

2

If you’re going to be widening a lot of 8-bit values to wchar_t, unsigned char is what you want, because that’s a no-op rather than a bitmask. If your char format is UTF-8, you also want to be able to use unsigned math for your shifts. If you’re using library functions, it’s most convenient to use the types your library expects.

The RISC-V architecture has both a LB instruction that loads a sign-extended 8-bit value into a register, and a LBU instruction that zero-extends. Both are equally efficient. In C, any signed char used in an arithmetic operation is widened to int, and the C standard library functions specify widening char to int, so this puts the variable in the correct format to use.

Storing is a matter of truncation, and converting from any integral type to unsigned char is trivial (bitmask by 0xff). Converting from an unsigned char to a signed value can be done in no more than two instructions, without conditionals or register pressure (SLLI to put the sign bit of the char into the sign bit of the machine register, followed by SRLI to sign-extend the upper bits).

There is therefore no additional overhead in this architecture to working with either. The API specifies sign-extension rather than zero-extension of signed quantities.

Incidentally, RV64I does not architecturally prefer sign-extension. That is the ABI convention, but the instruction set adds a LWU instruction to load a 32-bit value from memory with zero-extension and an ADDIW that can sign-extend a zero-extended 32-bit result. (There is no corresponding ADDIB for 8-bit or ADDIH for 16-bit quantities.)

Davislor
  • 14,674
  • 2
  • 34
  • 49