TL;DR: Realistically, just use John Kugelman's solution, copying 4 bytes is not measurable.
The biggest "measured" difference is 0.09 ps (239.79 - 239.70). That's 90 femtoseconds, or 0.00009 nanoseconds. Running the benchmark again, will yield wildly different results (in the picoseconds range.)
Measuring something as copying 4 bytes is not realistic. We're so far below nanoseconds that this is pure noise.
test |
#[bench] |
criterion |
try_into |
0 ns |
239.79 ps |
reinterpret |
0 ns |
239.70 ps |
bit unpack |
0 ns |
239.74 ps |
b.iter(|| 1) |
|
240.18 ps |
b.iter(|| 1) |
|
239.73 ps |
b.iter(|| 1) |
|
239.68 ps |
For fun, change all the tests to b.iter(|| 1)
, and you'll receive similar results fluctuating in picoseconds.
The biggest difference of the b.iter(|| 1)
tests, results in 0.5 ps (240.18 - 239.68). That's a "measured" difference of 0.5 ps. That's 500 femtoseconds, or 0.0005 nanoseconds.
That's literally a bigger difference, compared to when we did "actual" "work". This is pure noise.
You're talking about copying 4 bytes. This isn't going to be measurable, even if "every µs matters". This alone isn't going to be measurable in microseconds, and neither in nanoseconds.
(I'll avoid reiterating what's already been said in the comments.)
If you don't want to use TryInto
, then you can use some good old bit unpacking and bit shifting. (Out of bounds access will cause a panic.)
let i = (buf[1] as i32) |
(buf[2] as i32) << 8 |
(buf[3] as i32) << 16 |
(buf[4] as i32) << 24;
println!("{}", i);
// Prints `302055424`
Alternatively, you can also reinterpret buf
as a *const i32
pointer and dereference it. However, dereferencing a pointer is unsafe
. (Again, out of bounds access can cause a panic.)
// let i = unsafe { &*((buf.as_ptr().add(1)) as *const i32) };
let i = unsafe { &*((buf.as_ptr().offset(1)) as *const i32) };
println!("{:?}", i);
// Prints `302055424`
So you want the best performing solution for copying 4 bytes. Alright, let's take John Kugelman's solution and the previous 2 and benchmark them.
// benches/bench.rs
#![feature(test)]
extern crate test;
use test::Bencher;
use std::convert::TryInto;
#[bench]
fn bench_try_into(b: &mut Bencher) {
b.iter(|| {
let buf: [u8; 10] = [0, 0, 0, 1, 0x12, 14, 50, 120, 250, 6];
i32::from_ne_bytes((&buf[1..5]).try_into().unwrap())
});
}
#[bench]
fn bench_reinterpret(b: &mut Bencher) {
b.iter(|| {
let buf: [u8; 10] = [0, 0, 0, 1, 0x12, 14, 50, 120, 250, 6];
unsafe { &*((buf.as_ptr().offset(1)) as *const i32) }
});
}
#[bench]
fn bench_bit_unpack(b: &mut Bencher) {
b.iter(|| {
let buf: [u8; 10] = [0, 0, 0, 1, 0x12, 14, 50, 120, 250, 6];
(buf[1] as i32) | (buf[2] as i32) << 8 | (buf[3] as i32) << 16 | (buf[4] as i32) << 24
});
}
Now let's benchmark by executing cargo +nightly bench
.
running 3 tests
test bench_bit_unpack ... bench: 0 ns/iter (+/- 0)
test bench_reinterpret ... bench: 0 ns/iter (+/- 0)
test bench_try_into ... bench: 0 ns/iter (+/- 0)
Like I presumed, copying 4 bytes isn't going to be measurable.
Now, let's try and benchmark with criterion
. Maybe the test
crate is (being realistic and) limited to nanoseconds, who knows.
// benches/bench.rs
use criterion::{criterion_group, criterion_main, Criterion};
use std::convert::TryInto;
fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("try_into", |b| {
b.iter(|| {
let buf: [u8; 10] = [0, 0, 0, 1, 0x12, 14, 50, 120, 250, 6];
i32::from_ne_bytes((&buf[1..5]).try_into().unwrap())
})
});
c.bench_function("reinterpret", |b| {
b.iter(|| {
let buf: [u8; 10] = [0, 0, 0, 1, 0x12, 14, 50, 120, 250, 6];
unsafe { &*((buf.as_ptr().offset(1)) as *const i32) }
})
});
c.bench_function("bit_unpack", |b| {
b.iter(|| {
let buf: [u8; 10] = [0, 0, 0, 1, 0x12, 14, 50, 120, 250, 6];
(buf[1] as i32) | (buf[2] as i32) << 8 | (buf[3] as i32) << 16 | (buf[4] as i32) << 24
})
});
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
# Cargo.toml
[dev-dependencies]
criterion = "0.3.3"
[[bench]]
name = "bench"
harness = false
Now, let's benchmark by executing cargo bench
.
try_into time: [239.69 ps 239.79 ps 239.91 ps]
change: [+0.0101% +0.0700% +0.1316%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
3 (3.00%) low mild
4 (4.00%) high mild
7 (7.00%) high severe
reinterpret time: [239.63 ps 239.70 ps 239.78 ps]
change: [-0.7006% -0.2163% +0.0525%] (p = 0.45 > 0.05)
No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
4 (4.00%) high mild
7 (7.00%) high severe
bit_unpack time: [239.65 ps 239.74 ps 239.84 ps]
change: [-0.0768% +0.0775% +0.2867%] (p = 0.45 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low mild
3 (3.00%) high mild
8 (8.00%) high severe
test |
#[bench] |
criterion |
try_into |
0 ns |
239.79 ps |
reinterpret |
0 ns |
239.70 ps |
bit unpack |
0 ns |
239.74 ps |
So the mean measurements are 239.79 ps, 239.70 ps, and 239.74 ps. So the biggest "measured" difference is 0.09 ps. That's 90 femtoseconds, or 0.00009 nanoseconds. Running the benchmark again, will yield different results. Measuring something standalone as copying 4 bytes is not realistic.
Sure, in that instant "reinterpret" was the "fastest" but we're so far below nanoseconds that this is pure noise.
Use the solution you prefer, there isn't any measurable or significant performance difference between them.
For fun, change all the tests to b.iter(|| 1)
, and you'll receive similar results fluctuating in picoseconds.
c.bench_function("1", |b| b.iter(|| 1_i32));
c.bench_function("2", |b| b.iter(|| 1_i32));
c.bench_function("3", |b| b.iter(|| 1_i32));
Running the benchmark will result in similar results. I ran it once and got 240.18 ps, 239.73 ps, and 239.68 ps. That's a "measured" difference of 0.5 ps. That's 500 femtoseconds, or 0.0005 nanoseconds.
That's literally a bigger difference, compared to when we did "actual" "work". Again, this is pure noise. This isn't enough "work" to be measurable, in any significant way.
Again, use the solution you prefer, there isn't any measurable or significant performance difference between them.