-1

I know that an unsigned character's size is 8 bits, and the size of an integer is 32 bits.

But I want to know if I perform an operation between two integers below 255, is it safe to say it is as fast as performing the same operation on two unsigned characters of the same value of that integers?

Example:

int Int2 = 0x10;
int Int1 = 0xff;

unsigned char Char0 = 0x10;
unsigned char Char1 = 0xff;

Int1  + Int2 ; // Is calculating this
Char0 + Char1; // Faster than this??

Update: Let's put this in the context as someone suggested

for (unsigned char c=0;c!=256;c++){ // does this loop

   std::cout<<c; // dont mind this line it can be any statement
}

for (int i=0;i!=256;i++){ // perform faster than this one??

   std::cout<<i;// this too
}
YoloWex
  • 139
  • 7
  • 4
    given that all the values are known to the compiler, there might be no runtime calculation at all. Anyhow you are not using the result of the `+` operation, so any optimizer that emits code for those two lines can be considered broken – 463035818_is_not_an_ai May 09 '22 at 15:18
  • 7
    Handy reading: [Integer Promotion](https://en.cppreference.com/w/c/language/conversion#Integer_promotions) – user4581301 May 09 '22 at 15:19
  • 1
    @user4581301: That doesn't mean compilers actually *use* wider operations if they don't need to, though. The results have to be *as if* they widened, and then maybe narrowed again if you assign the result to another `unsigned char`. For most machines, that's equivalent to doing a narrow operation in the first place, like 1 element of a SIMD `paddb` that does 16 separate 1-byte additions in parallel. e.g. https://godbolt.org/z/YPxh14xG1 – Peter Cordes May 09 '22 at 15:23
  • True enough. Leads to a bit more handy reading: [The As-if rule](https://stackoverflow.com/questions/15718262/what-exactly-is-the-as-if-rule) TL;DR version: if you can't tell the difference, the compiler can do whatever it wants. – user4581301 May 09 '22 at 15:31
  • I suggest looking into `stdint.h`, which defines things like `uint8_t` (8-bit, unsigned type), and also things like, `uint_fast8_t`, which is the *fastest* size that can hold a `uint8` (it may be bigger, such as 32-bits, if the processor can handle 32-bit processing faster than 8-bit processing). – abelenky May 09 '22 at 15:33
  • @PeterCordes _Side note:_ In your godbolt link, the final `movdqu` [I expected this code] in one case was converted to `movups` [I didn't expect this code]. I realize they're both equivalent when writing back to memory for layout, but how/why did the compiler change this? – Craig Estey May 09 '22 at 15:36
  • @CraigEstey: `movups` has a shorter opcode by 1 byte; no mandatory prefix vs. `movupd` or `movqdu`, for the non-VEX (non-AVX) encoding. And no current CPUs have extra bypass latency for stores, whether they're SIMD-integer vs. SIMD-fp domains. (For stores specifically, maybe no CPUs ever cared, and/or store latency is generally not observable and very hard for it to be a bottleneck. GCC still uses `movups` there for `-march=nehalem` which was notoriously picky about SIMD domains and had 2-cycle bypass latency penalties, vs. 1 for most CPUs when they have any) – Peter Cordes May 09 '22 at 15:37
  • The first loop will run forever if `unsigned char` is 8 bits (which it almost certainly is); the value of `c` can never be 256. – Pete Becker May 09 '22 at 15:53
  • @abelenky Oh my fault, new keyboard layout – YoloWex May 09 '22 at 16:01
  • `c!=256` is always false for `unsigned char` on most systems, where CHAR_BIT is 8. Other than that, if your compiler makes one loop faster than the other, it's doing a bad job. (Assuming your loop body actually does equivalent things with the value, unlike here where `cout<<` behaviour depends on type, e.g. for a character it's like `putchar`, while for an integer it's like `printf("%d", i)` formatting it into a string of ASCII decimal digits. So you picked one of the worst possible examples, where the loop body does very different work. Also I/O function calls are *very* slow vs. looping.) – Peter Cordes May 09 '22 at 16:04
  • Since an answer can only be true for one set of compiler and target, why don't you simply benchmark? Other compilers or other systems generally can have other results. So there is no definitive answer. – the busybee May 09 '22 at 16:30
  • 2
    As a general rule, if you have an N-bit machine, any basic operation — addition, comparison, etc. — on two N-bit integers will be positively as fast as it can be. No operations on smaller integers will be any faster, and they might actually be somewhat slower. – Steve Summit May 09 '22 at 17:11

1 Answers1

3

I know that an unsigned character's size is 8 bits

This is not necessarily always the case in C++. But it may be true in particular implementation of C++.

and the size of an integer is 32 bits.

There are several integer types in C++. In fact, character types are integer types as well.

Int1  + Int2 ; // Is calculating this
Char0 + Char1; // Faster than this??

Integers of lower rank than int are promoted to int (or unsigned int in rare cases) when used as operand of most binary operators. Both operators in the example operate on int after the promotion. You don't use the result at all, so there's no need for the compiler to produce any code, so they should be equally fast in this trivial example.

Whether one piece of code is faster than the other depends on many factors. It's not possible to accurately guess which way it would go without context.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Often operations of two chars have the result assigned back to another unsigned char, truncating it. The compiler can then elide the actual widening and truncation and just do an 8-bit operation if that's faster, while still giving the same result *as if* they widened and narrowed. e.g. this could become 1 element of a SIMD `paddb` that does 16 separate 1-byte additions in parallel. e.g. https://godbolt.org/z/YPxh14xG1 shows GCC and clang for x86-64 auto-vectorizing an `a[i] += b[i]` loop for `unsigned char` elements, doing 16 at a time without widening. – Peter Cordes May 09 '22 at 15:27
  • So the real key part of the answer is the last paragraph. What you do with the result (and where the inputs come from) matter critically. Also worth pointing out that almost all integer operations on modern CPUs (except division) have performance that's not data-dependent. e.g. `x * y` with runtime-variable x and y (so it compiles to an actual multiply, not shift/add for simple compile-time constants) is not slower for `0x12345678U` than for `2U`. – Peter Cordes May 09 '22 at 15:27
  • Basically I think the misconceptions evident in the way it was asked mean it needs a *much* bigger answer to get anything resembling a useful understanding of the performance implications. e.g. [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) re: how modern compilers optimize C to asm would be a reasonable *starting point* for doing a lot of reading. And [What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?](https://stackoverflow.com/q/51607391) re: CPUs. – Peter Cordes May 09 '22 at 15:31
  • The point about unused results not being computed at all is key, but other implications of the as-if rule are also critically important to understanding that widening doesn't *have* to happen. And that CPUs may do it for free. – Peter Cordes May 09 '22 at 15:35
  • Hey, I updated the Question, mind if you look again? – YoloWex May 09 '22 at 15:36