In an attempt to determine whether I can/should use Rust instead of the default C/C++ I'm looking into various edge cases, mostly with this question in mind: In the 0.1% of cases where it does matter, can I always get compiler output as good as gcc's (with the appropriate optimization flags)? The answer is most likely no, but let's see...
On Reddit there is a rather idiosyncratic example that studies the compiler output of a subroutine for a branchless sort algorithm.
Here is the benchmark C code:
#include <stdint.h>
#include <stdlib.h>
int32_t* foo(int32_t* elements, int32_t* buffer, int32_t pivot)
{
size_t buffer_index = 0;
for (size_t i = 0; i < 64; ++i) {
buffer[buffer_index] = (int32_t)i;
buffer_index += (size_t)(elements[i] < pivot);
}
}
And here is the godbolt link with compiler output.
The first attempt with Rust looks like this:
pub fn foo0(elements: &Vec<i32>, mut buffer: [i32; 64], pivot: i32) -> () {
let mut buffer_index: usize = 0;
for i in 0..buffer.len() {
buffer[buffer_index] = i as i32;
buffer_index += (elements[i] < pivot) as usize;
}
}
There's quite a bit of bounds checking going on, see godbolt.
The next attempt eliminates the first bounds checking:
pub unsafe fn foo1(elements: &Vec<i32>, mut buffer: [i32; 64], pivot: i32) -> () {
let mut buffer_index: usize = 0;
for i in 0..buffer.len() {
unsafe {
buffer[buffer_index] = i as i32;
buffer_index += (elements.get_unchecked(i) < &pivot) as usize;
}
}
}
That's a little better (see the same godbolt link as above).
Finally, let's try to remove the bounds checks altogether:
use std::ptr;
pub unsafe fn foo2(elements: &Vec<i32>, mut buffer: [i32; 64], pivot: i32) -> () {
let mut buffer_index: usize = 0;
unsafe {
for i in 0..buffer.len() {
ptr::replace(&mut buffer[buffer_index], i as i32);
buffer_index += (elements.get_unchecked(i) < &pivot) as usize;
}
}
}
This produces the same output as foo1
, so ptr::replace
still performs bounds checking. I'm certainly out of my depth, here, with those unsafe
operations. That leads to my two questions:
- How can the bounds check be eliminated?
- Does it even make sense to analyze edge cases like this? Or would the Rust compiler see through all this if presented with the whole algorithm instead of only a small fraction thereof.
Regarding the last point, I'm curious, in general, whether Rust can be butchered to the point where it is as "literal", i.e. close to the metal, as C is. Seasoned Rust programmers will probably cringe at this line of investigation, but here it is...