I want to avoid an allocation inside a loop:
use std::time::Instant;
/// calculate the sum of the squares of the elements in `numbers`
fn sum_of_squares(numbers: Vec<i32>) -> i32 {
numbers.iter().map(|x| x * x).sum()
}
fn main() {
// start the timer
let start = Instant::now();
// "benchmark" the function 'sum_of_squares' by executing it a couple of times.
for _ in 1..1000 {
// Here I want to avoid the alloction.
// This causes an ALLOCATION, wasting time the code could have spent calculating sum_of_squares D;
let testcase: Vec<i32> = vec![4; 16];
// here the actual work, I am interested in measuring, happens.
sum_of_squares(testcase);
}
let elapsed_time = start.elapsed();
println!("1000 sum_of_squaress in: {:?}", elapsed_time);
}
I want to spend the maximum of the time in the for loop calculating the sum of squares, so I would like to put the line let testcase: Vec<i32> = vec![4; 16]
outside of the loop such that the only thing happening in the loop is adding together squares.
The problem is that the variable testcase
is moved into the function sum_of_squares
. When I put let testcase = ...
outside of the loop, the borrow checker complains because after the
first iteration of the for loop, the variable testcases is no longer around.
Even if the time to allocate testcase
is negligible, I would really like to know if I can avoid to allocate in each iteration of the loop.
What I am not looking for
I could rewrite
sum_of_squares
to take a reference as an argument instead of taking ownership oftestcase
, but I wonder whether there is another way that does not require me to changesum_of_squares
.Another alternative would be to time the individual calls to
sum_of_squares
and then add up each elapsed time, but I am curious if there is a way where I can move the allocation out of the loop and, through some magic, passtestcase
tosum_of_squares
.
What I am looking for
fn main() {
// start the timer
let start = Instant::now();
// This where I want to put the definition of testcase
let testcase: Vec<i32> = vec![4; 16];
// "benchmark" the function 'sum_of_squares' by executing it a couple of times.
for _ in 1..1000 {
// * magic * to pass testcase to sum_of_squares
}
}
Sidenotes
- I am looking at the assembly in radare2 to see whether the program allocates in the loop.
- This question came up when I was working on making my MD5 implementation faster. I challenged myself to a race against
openssl speed md5
. As my actual code is too big too post here I have come up with this example.