I'm writing a small library in rust as a 'collision detection' module. In trying to make a line-segment - line-segment intersection test as fast as possible, I've run into a strange discrepancy where seemingly useless code is making the function run faster.
My goal is optimising this method as much as possible, and knowing why the executable code changes performance as significantly as it does I would imagine would be helpful in achieving that goal. Whether it be a quirk in the rust compiler or something else, if anyone is knowledgeable to be able to understand this, here's what I have so far:
The two functions:
fn line_line(a1x: f64, a1y: f64, a2x: f64, a2y: f64, b1x: f64, b1y: f64, b2x: f64, b2y: f64) -> bool {
let dax = a2x - a1x;
let day = a2y - a1y;
let dbx = b2x - b1x;
let dby = b2y - b1y;
let dot = dax * dby - dbx * day;
let dd = dot * dot;
let nd1x = a1x - b1x;
let nd1y = a1y - b1y;
let t = (dax * nd1y - day * nd1x) * dot;
let u = (dbx * nd1y - dby * nd1x) * dot;
u >= 0.0 && u <= dd && t >= 0.0 && t <= dd
}
fn line_line_if(a1x: f64, a1y: f64, a2x: f64, a2y: f64, b1x: f64, b1y: f64, b2x: f64, b2y: f64) -> bool {
let dax = a2x - a1x;
let day = a2y - a1y;
let dbx = b2x - b1x;
let dby = b2y - b1y;
let dot = dax * dby - dbx * day;
if dot == 0.0 { return false } // useless branch if dot isn't 0 ?
let dd = dot * dot;
let nd1x = a1x - b1x;
let nd1y = a1y - b1y;
let t = (dax * nd1y - day * nd1x) * dot;
let u = (dbx * nd1y - dby * nd1x) * dot;
u >= 0.0 && u <= dd && t >= 0.0 && t <= dd
}
Criterion-rs bench code:
fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("line-line", |b| b.iter(|| line_line(
black_box(0.0), black_box(1.0), black_box(2.0), black_box(0.0),
black_box(1.0), black_box(0.0), black_box(2.0), black_box(4.0))));
c.bench_function("line-line-if", |b| b.iter(|| line_line_if(
black_box(0.0), black_box(1.0), black_box(2.0), black_box(0.0),
black_box(1.0), black_box(0.0), black_box(2.0), black_box(4.0))));
}
Note that with these inputs, the if statement will not evaluate to true.
assert_eq!(line_line_if(0.0, 1.0, 2.0, 0.0, 1.0, 0.0, 2.0, 4.0), true); // does not panic
Criterion-rs bench results:
line-line time: [5.6718 ns 5.7203 ns 5.7640 ns]
line-line-if time: [5.1215 ns 5.1791 ns 5.2312 ns]
Godbolt.org rust compilation results for these two pieces of code.
Unfortunately I cannot read asm as well as I'd like. I do not know how to ask rustc to compile this without performing an inline of the entire calculation within the website either, but if you can do that I hope this may be of help?
My current environment is:
-Windows 10 (latest standard)
-Ryzen 3200G cpu
-Standard Rust install (no nightly/beta), compiling with znver1 (last I checked)
Testing is being done with cargo bench
command, which I assume to be making optimizations, but haven't found a confirmation of this anywhere. Criterion-rs is doing all of the benchmark handling.
Thank you in advance if you're able to answer this.
EDIT:
Comparing the code in the godbolt link with -C opt-level=3 --target x86_64-pc-windows-msvc
(the toolchain used by my installation of rustc currently) reveals that the compiler has inserted many unpcklpd and unpckhpd instructions into the function which may be slowing the code down rather than optimizing it. Setting the opt-level parameter to 2 does remove the extra unpck instructions without otherwise significantly changing the compilation result. However, adding
[profile.release]
opt-level = 2
While causing a recompilation, the backwards performance remained, same at opt-level = 1
. I do not have target-cpu=native
specified.