0

I want to avoid an allocation inside a loop:

use std::time::Instant;

/// calculate the sum of the squares of the elements in `numbers`
fn sum_of_squares(numbers: Vec<i32>) -> i32 {
    numbers.iter().map(|x| x * x).sum()
}

fn main() {
    // start the timer
    let start = Instant::now();

    // "benchmark" the function 'sum_of_squares' by executing it a couple of times.
    for _ in 1..1000 {
        // Here I want to avoid the alloction.
        // This causes an ALLOCATION, wasting time the code could have spent calculating sum_of_squares D;
        let testcase: Vec<i32> = vec![4; 16];

        // here the actual work, I am interested in measuring, happens.
        sum_of_squares(testcase);
    }

    let elapsed_time = start.elapsed();
    println!("1000 sum_of_squaress in: {:?}", elapsed_time);
}

I want to spend the maximum of the time in the for loop calculating the sum of squares, so I would like to put the line let testcase: Vec<i32> = vec![4; 16] outside of the loop such that the only thing happening in the loop is adding together squares.

The problem is that the variable testcase is moved into the function sum_of_squares. When I put let testcase = ... outside of the loop, the borrow checker complains because after the first iteration of the for loop, the variable testcases is no longer around.

Even if the time to allocate testcase is negligible, I would really like to know if I can avoid to allocate in each iteration of the loop.

What I am not looking for

  • I could rewrite sum_of_squares to take a reference as an argument instead of taking ownership of testcase, but I wonder whether there is another way that does not require me to change sum_of_squares.

  • Another alternative would be to time the individual calls to sum_of_squares and then add up each elapsed time, but I am curious if there is a way where I can move the allocation out of the loop and, through some magic, pass testcase to sum_of_squares.

What I am looking for

fn main() {
    // start the timer
    let start = Instant::now();

    // This where I want to put the definition of testcase
    let testcase: Vec<i32> = vec![4; 16];

    // "benchmark" the function 'sum_of_squares' by executing it a couple of times.
    for _ in 1..1000 {

        // * magic * to pass testcase to sum_of_squares
    }
}

Sidenotes

  • I am looking at the assembly in radare2 to see whether the program allocates in the loop.
  • This question came up when I was working on making my MD5 implementation faster. I challenged myself to a race against openssl speed md5. As my actual code is too big too post here I have come up with this example.
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • Can you or cant change `sum_of_squares` signature? – Netwave Jul 12 '21 at 18:36
  • *I could rewrite sum_of_squares to take a reference as an argument instead of taking ownership of testcase, but I wonder whether there is another way that does not require me to change sum_of_squares.* - Sorry, but no - with the hard constraints you've set out, it cannot be done. If `sum_of_squares` is dead-set on destroying the allocation it receives, how would the caller hope to reuse it? Note that `sum_of_squares` taking `&[i32]` would make it no less performant because it would receive a reference/length pair, **not** a reference to the `Vec` (that would be `&Vec`). – user4815162342 Jul 12 '21 at 18:38
  • I can but I am just curious wether there is an alternative way! That is why I am asking the question. – Louis Kronberg Jul 12 '21 at 18:39

2 Answers2

0

A simple way would be to pass and retake the owned testcase, but you would need to return it from the sum_of_squares function:

use std::time::Instant;

/// calculate the sum of the squares of the elements in `numbers` 
fn sum_of_squares(numbers: Vec<i32>) -> (i32, Vec<i32>) {
    (numbers.iter().map(|x| x*x).sum(), numbers)
}

fn main() {

    // start the timer 
    let start = Instant::now();

    let mut testcase: Vec<i32> = vec![4; 16];
    
    // "benchmark" the function 'sum_of_squares' by executing it a couple of times.
    for _ in 1..1000 {

        // here the actual work, I am interested in measuring, happens.
        let (_, nums) = sum_of_squares(testcase);
        testcase = nums;

    }

    let elapsed_time = start.elapsed();
    println!("1000 sum_of_squaress in: {:?}", elapsed_time);
}

Playground

Netwave
  • 40,134
  • 6
  • 50
  • 93
0

define the numbers: Vec<i32> as Singleton inside the function
there are two possibilities to do this: std::sync::Once and the lazy_static macro
I prefer the later, because the Singleton is capsulated (means not outside the function visible)
(a change from the Vec to eg. a RangeInclusive solution would bring about a further speed advantage)

extern crate lazy_static;

fn sum_of_squares() -> i32 {
    lazy_static! {
        static ref NUMBERS: Vec<i32> = vec![4; 16];
    }
    NUMBERS.iter().map(|x| x * x).sum()
}

(Your timing method is only suitable for an approximate assessment of the performance and does not replace a benchmark)

Kaplan
  • 2,572
  • 13
  • 14