-1

When is it better (from a performance/execution speed/caching perspective) to use the default 32 bit integer type (unsigned if possible) versus making it 8 bit or 16 bit if we know for sure that the value will fit?

I'm quite sure this depends on the situation (maybe a struct/class field it's better to use a smaller integer because the object will be smaller? Or maybe it's better to default to the 32 bit so the instructions aren't "padded"? ).

From my current understanding in a data structure with many entries you would prefer having a smaller type (like a 8 bit short) so that cache prefetching is more effective (more values in the data cache). But I don't really know if using smaller types are actually better in other situations.

Thanks in advance.

Pker
  • 9
  • 3
  • 3
    You have more choice than you think - https://en.cppreference.com/w/cpp/types/integer Without a concrete example (in the question) we will either be guessing or offering an Opinion. Please read [ask] with a [mcve]. – Richard Critten Mar 16 '21 at 22:00
  • I don't think I can provide an example since it's a general question about how the processor works, not about a specific program.. But thanks for your comment. – Pker Mar 16 '21 at 22:06
  • It really depends on the situation (the code, platform, compiler). In some cases having a smaller type can indeed hurt performance. Some operations are possible with 32-bit operands only, and the compiler must generate additional instructions to pad/unpad the operands. But wasting D-cache space hurts performance, too. Inspect the generated assembly and benchmark the hot path to be sure. – rustyx Mar 16 '21 at 22:15
  • In addition, if you care about performance, you might be interested in the "new" concept of *data oriented design*. There was a nice talk about it on youtube. – rustyx Mar 16 '21 at 22:24
  • In general, smaller data types, such as `uint16_t` and `uint8_t` are used for space optimization. A 32-bit processor can fetch 4 bytes (`uint8_t`) at once. Fetching one byte, especially on unaligned addresses takes more work. Although many processors are now designed to fetch one byte without any penalties. There is really no efficiency gain by using values smaller than the processor's word size. Most data buses are designed to accommodate the width of the processor's word size. The cost of transporting 32-bits vs. 8-bits on a 32-bit data bus is insignificant. – Thomas Matthews Mar 16 '21 at 22:47

1 Answers1

2

In addition to occupying less cache, smaller types occupy less disk space, and utilize less network bandwidth.

In general, for computation the CPU likes its native size, but for I/O smaller types (higher information density) is desirable. Your thought about cache prefetching is merely the specific case of I/O between processor cache and system RAM.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • So a general rule of thumb (because yes that's what I'm looking for here) is for application that don't have a lot of I/O operations I should simply use the native type to avoid useless padding? – Pker Mar 16 '21 at 22:08
  • @Pker: As long as you are including RAM and cache, which are often not thought of as I/O, yes. – Ben Voigt Mar 16 '21 at 22:09
  • 1
    @Pker It's not possible to give a _good rule of thumb_ about this, there are too many factors to consider for each and every use case. I wouldn't waste time in prematurely optimizing things, which may even turn out to be the worse choice in a specific situation. – πάντα ῥεῖ Mar 16 '21 at 22:11
  • Right which makes this a lot more technical, my quick conclusion is that trying to find a rule of thumb isn't going to do me any good since RAM/cache are pretty much always core components. I'll read a book about memory. Thanks – Pker Mar 16 '21 at 22:13
  • @πάνταῥεῖ Right, I come to the same conclusion. I didn't want to do premature optimization or anything evil of that sort, it's just that i'm trying to orient my design towards cache friendliness for obvious reasons. Thanks for the feedback. – Pker Mar 16 '21 at 22:16
  • @Pker: Most processor internal buses are designed to handle the bitwidth of the processor. So a 32-bit processor's internal data bus would be at least 32-bits. Using only 8 of the 32-bits doesn't save any time, since 32-bits are clocked at the same time. (Clocking other combinations is more complicated and taking up more circuitry.) Performance-wise, the gain is insignificant (we're talking nanoseconds, if any). Bottlenecks (of significant duration) lie elsewhere. – Thomas Matthews Mar 16 '21 at 22:52