Why is there no optimization for uint8?

Question

So I have been researching how the variable uint8 works and I have realized that it is actually not faster than int! In order to multiply, divide, add, or subtract, the program must turn uint8 into an int which will make it about the same speed or slightly slower.

Why did C++ not implement multiplying, dividing, adding, or subtracting directly to uint8?

Optimizations are generally platform specific. In the majority of platforms, the CPU architecture is larger than 8 bits. It physically uses values larger than `uint8`. This lies way outside the purview of the c++ standard. — François Andrieux, Feb 27 '17 at 18:04
I suspect this is the CPU that works on 32/64bits that changes it, not C++. i.e. the compiler can't do anything but that when targeting your platform. — fwg, Feb 27 '17 at 18:06
@FrançoisAndrieux That is right, but I am wondering why C++ did not make a direct way to add, subtract, multiply, or divide uint8 together. — Kevin Duarte, Feb 27 '17 at 18:06
@KevinDuarte: The C++ standard did not forbid such operations; indeed, it explicitly *allows* them. But the standard does not define how the compiler *implements* that operation. It's up to the compiler to decide how to best generate the assembly for that operation. — Nicol Bolas, Feb 27 '17 at 18:07
IIRC, `int` is the optimal type for the CPU. Use of `uint8_t` is useful only when creating large number of objects. — R Sahu, Feb 27 '17 at 18:10
There is a misconception that `uint8_t` is slower than an `int`. Depends on the processors. Processors like the ARM 32-bit series are optimized for both 32-bit and 8-bit fetches. Some architectures convert 8-bit value to 32-bit (usually during the fetch cycle) and thus are internally treated as 32-bit integers for the ALU and other CPU entities. — Thomas Matthews, Feb 27 '17 at 18:10
@NicolBolas: C++ states that `uint8 + uint8` results in `unsigned int`... — Jarod42, Feb 27 '17 at 18:12
In general, `uint8_t` would be used when the hardware requires it or for protocol (message and data formats). Sometimes `uint8_t` is used when memory capacity is constrained. All depends on the platform architecture and the efficient / memory space requirements. — Thomas Matthews, Feb 27 '17 at 18:13
@ThomasMatthews Correct, uint8_t may be changed due to the architecture just like a regular int would. So why is uint8_t still being treated differently than an int? They both are not guaranteed to be the same size on different architectures. So when uint8_t is actually a uint8_t on the architecture, shouldn't there be direct operations on uint8_t just like regular ints? — Kevin Duarte, Feb 27 '17 at 18:20
@Jarod42: "*C++ states that `uint8` + `uint8` results in `unsigned int`...*" Yes, it does. That doesn't mean that it isn't a "`uint8` operation". And compilers are allowed to make `uint8 = uint8 + uint8` operations not do an explicit conversion from `unsigned int`, so long as the result would be the same. — Nicol Bolas, Feb 27 '17 at 18:27

score 4 · Accepted Answer · edited May 23 '17 at 12:17

Why did C++ not implement multiplying, dividing, adding, or subtracting directly to uint8?

Because the optimal way doing that is platform specific.

Most CPU's provide these operations as assembler instructions based on using integer values of a specific default size (e.g. 32 bits, or 64 bits like shown here for 16 bit instructions), they may or may not have such instructions for uint8 values.
The bit size is usually optimized for the CPU's cache lining mechanisms.

So the optimal implementation is dependend on the available target CPU instructions and cannot be covered by the C++ standard.

@Kevin That would match the ARM 16 bit sample I've linked in my answer. — πάντα ῥεῖ, Feb 27 '17 at 19:22

score 1 · Answer 2 · edited Mar 01 '17 at 16:11

I'm not sure wether or not a compiler will produce 8bit arithmetic operations for uint8_t when properate （quite unlikely for it is unlikely to be faster).

@harold mentioned, what I said before is not so morden now... Partial register update problem is no longer so serious now for 8bit operations. So, just that most 8bit operations are not faster. While 8bit division is a little faster and I'm trying to figure out why MS's compiler won't use it. (Not so sure: As the partially updating problem is just mostly reduced not completely removed, and even kept by AMD, that one cycle benefit of 8bit division just not worth to be abused).

Original: On morden x86 processors, 8bit operations face a problem called partial register update that you only change part of the full register, which results in false dependency that seriously impacts performance.

And FYI, at the language level there is no arithmetic for integral types smaller than int in C++. There is the usual arithmetic promotion to lift the type.

The false dependencies only really happen on AMD (pre-Zen anyway, who knows what Zen might do) and NetBurst, on P3 and Core2 and its descendants the 8bit registers can be renamed independently and issues only come up when there is a true dependency on a "split" register (so, using the full register after updating it partially), and *even that* is fast on Haswell and newer — harold, Feb 27 '17 at 18:33
Well it's still a thing on AMD for now, so it's not completely a thing of the past — harold, Feb 27 '17 at 19:04

Why is there no optimization for uint8?

2 Answers2