How much overhead do integer type casts cause in Rust?

Question

In short:

If I repeatedly multiply two u8 values and the result can overflow u8's value range, is it more efficient (considering both memory usage and speed) to just use u16 values from the start instead or cast the values to u16s always before the multiplication to prevent overflow?

The whole story:

I'm writing a Rust+Wasm program where I have a grid of cells that I pass into JavaScript as a raw pointer to an array (very much like in the Rust and WebAssembly tutorial). The grid cannot ever be larger than 250 x 250 cells in size, so I have the properties width and height for the grid as u8s.

So, as a simplified example:

pub struct Grid {
    width: u8,
    height: u8,
    cells: Vec<Cell>
}

I then manipulate the cell values according to their neighboring values in the grid step by step. When doing this, I was getting the index of the cell in question in the grid with:

fn get_cell_index(&self, col: u8, row: u8) -> usize {
    (row * self.width + col) as usize
}

However, I realized that with large enough grids this results in an overflow because the multiplication is done with u8 values before the as usize part is reached (i.e. the result of the multiplication is stored in a u8 type already).

There's an easy workaround for this:

fn get_cell_index(&self, col: u8, row: u8) -> usize {
    (row as u16 * self.width as u16 + col as u16) as usize
}

But that - in addition to being ugly - raised the question if there's any sense to do this casting of 3 u8 values to u16 for each cell in the grid, repeated several times when the program is running. The most obvious option would be to just change the width and height properties to be u16 the whole time so no casting would be needed.

This is probably the wisest choice efficiency-wise considering that using u16s instead of u8s for two values will only take up 2 more bytes of memory from the start, and doing the casts will probably have a non-zero performance (speed) penalty. However, I'd still like to know exactly what kind of efficiency costs each approach has. And to be honest, I kind of like how the current u8 type documents the fact that the dimensional values will never be bigger than 250...

And just to state the obvious: Yes, I know that in this example this is of course a micro-optimization that makes absolutely no sense – but I'm eager to know the theoretical implications of different approaches for making sensible choices in bigger projects.

When in doubt, one can always refer to Compiler Explorer (https://rust.godbolt.org/) to check the generated assembly. My personal guess on this specific problem would be there being no noticeable difference: extending unsigned integers normally cost at most one CPU instruction (either e.g. an `xor` to clear the whole register before loading into the low bits of it, or some bit extension instruction after loading). — Ruifeng Xie, May 18 '21 at 17:22
Note that in C it is impossible to multiply two `u8` values, because they will always be integer-promoted, so C always has these cast, only they are hidden. What I mean is that even if you do not cast, some architectures will do the cast anyways because they cannot operate easily on smaller types (and that is what C does). — rodrigo, May 18 '21 at 18:06
In practice, integer promotion as part of an expression is a no-op. The CPU's registers will always be its native size (probably 32- or 64-bit) and anything less than that just affects what instructions are used to operate on those registers. — kmdreko, May 18 '21 at 18:25
@rodrigo there is no u8 value in C and what are you saying I never hear of that and I know the C standard by heart. — Stargateur, May 18 '21 at 19:23
@kmdreko not really copy can multiply more than 1 integer in the same time using specific instruction. for example use 64 bit instruction that is handle as 2 32 bit number. — Stargateur, May 18 '21 at 19:25
@Stargateur sure, needless promotion can de-optimize vectorized instructions. But OP is comparing *storing* vs *casting*; in both cases they want `u16`s to operate on. — kmdreko, May 18 '21 at 19:59
@Stargateur: The equivalent to `u8` in C is obviously `unsigned char`, and if you do `unsigned char a=255, b=255; printf("%d\n", a + b);` you'll get `510`. There is no way in C to add two `unsigned char` values without promoting them before to integer. Sure you can cast the result back to `unsigned char` and you'll get the expected value, but that is an extra operation. — rodrigo, May 19 '21 at 07:20
@rodrigo no, that totally wrong an equivalent is uint_8, `unsigned char` have specific attribute for example, it can alias everything, you don't know what you are talking about. Also relly an auto promoting rule of C is so wrong, C is broken sometime promote randomly to int... that misleading. — Stargateur, May 19 '21 at 12:36
@Stargateur: True, `uint8_t` is more like `u8` but for any architecture relevant to Rust, `CHAR_BITS` equals 8, SO `uint8_t` and `unsigned char` have the same representation. About your other comment, sorry but I don't understand you, [integer promotion](https://stackoverflow.com/a/46073296/865874) is a well known _feature_ of the C language. — rodrigo, May 19 '21 at 13:02
@rodrigo so known and simple that it need a very long answer, you didn't get my point. also, same number of bits != same type, char is very special in C — Stargateur, May 19 '21 at 13:08

score 3 · Accepted Answer · answered May 19 '21 at 14:30

Absolutely none for u8 -> u16 which appears to be the gist of your question.

It is important that it is macro-pessimization, and you mentioning it in your question won't stop me from repeating the common answers that such questions get. You want sensible choice? Use usize for container sizes, as simple as that. There's absolutely no reason to ever limit the size artificially because there's nothing saved, if anything, there's a lot of things being lost.

Your code is unreadable, you even said so yourself.
Your code theoretically may introduce overhead on CPU, for the first time width is read, it is irrelevant, but it may happen, because CPU register size is usize, not u8 or u16, and it may or may not have to bitshift that byte to properly write it into the register, would've been so much easier for it to just read usize which is its natural register size and already properly aligned in memory.
"The grid cannot ever be larger than 250 x 250" Oh yeah? What stops me from inputting 251u8? You will have to write code that validates that anyway if you wanted to enforce that.
Now, why does it stop me from inputting 257usize, 1456usize... Nusize?
Your time is spent on bikeshedding instead of things that matter. I imagine our goal is usually to get things done in least amount of time possible, not maximum.

General advice: do not optimize container wrappers, optimize the data inside them, that's the sensible choice in any project no matter the scale.

Besides, what kind of "optimization" is it to limit the maximum size of a Grid? You may not want a Grid bigger than 250x250, but someone else might, and now your code is unusable for no real reason. Someone else might be you, going back to your project 5 months down the line, and cursing your past self while you have to refactor the entire thing just to copy paste it into another project where you want 2500 x 2500 grid for something completely unrelated.

Lets briefly focus on Grid versus Vec<Cell> for a bit. How many Grids do you plan to have? I imagine only one, or at most, a dozen. But how many Cell's will there be? Thousands.

Think of what purpose there is of saving few bytes on Grid, versus few bytes on Cell, the generalization can almost universally boil down to O(1) vs O(n) of bytes saved when it comes to data structures like this.

When it comes to heavy data processing, you may end up having hundreds of Vec holding potentially hundreds of millions of elements. You should focus on the millions here. That's one of the many reason why Rust Vec itself is just, arbitrarily, 3 usizes. And I'm not aware of any projects that were worried about the fact that Vec stores its length and capacity as usize.

Above all else, remember: Premature pessimization is root of all evil.

Thanks! I could say the points 3-5 don't matter that much here as this is about a short-lived personal project for learning Rust and Wasm interplay, but they are valid points nonetheless. And your explanation convinced me to just go with `usize`s for the grid dimensions. (It also seems this question was a duplicate after all even though I couldn't find the older one when searching before asking it... I could delete it but I'll leave it be – as a closed question – to preserve your good answer.) — MJV, May 20 '21 at 11:04

How much overhead do integer type casts cause in Rust?

1 Answers1