3

I wrote a simple multithreaded application in Rust to add the numbers from 1 to x. (I know there is a formula for this, but the point was to write some multi threaded code in Rust, not to get the outcome.) It worked fine, but after I refactored it to a more functional style instead of imperative, there was no more speedup from multithreading. When inspecting the CPU usage, it appears that only one core is used of my 4 core / 8 thread CPU. The original code has 790% CPU usage, while the refactored version only has 99%.

The original code:

use std::thread;

fn main() {
    let mut handles: Vec<thread::JoinHandle<u64>> = Vec::with_capacity(8);

    const thread_count: u64 = 8;
    const batch_size: u64 = 20000000;

    for thread_id in 0..thread_count {
        handles.push(thread::spawn(move || {
            let mut sum = 0_u64;

            for i in thread_id * batch_size + 1_u64..(thread_id + 1) * batch_size + 1_u64 {
                sum += i;
            }

            sum
        }));
    }

    let mut total_sum = 0_u64;

    for handle in handles.into_iter() {
        total_sum += handle.join().unwrap();
    }
    println!("{}", total_sum);
}

The refactored code:

use std::thread;

fn main() {
    const THREAD_COUNT: u64 = 8;
    const BATCH_SIZE: u64 = 20000000;

    // spawn threads that calculate a part of the sum
    let handles = (0..THREAD_COUNT).map(|thread_id| {
        thread::spawn(move ||
            // calculate the sum of all numbers from assigned to this thread
            (thread_id * BATCH_SIZE + 1 .. (thread_id + 1) * BATCH_SIZE + 1)
                .fold(0_u64,|sum, number| sum + number))
    });

    // add the parts of the sum together to get the total sum
    let sum = handles.fold(0_u64, |sum, handle| sum + handle.join().unwrap());

    println!("{}", sum);
}

the outputs of the programs are the same (12800000080000000), but the refactored version is 5-6 times slower.

It appears that iterators are lazily evaluated. How can I force the entire iterator to be evaluated? I tried to collect it into an array of type [thread::JoinHandle<u64>; THREAD_COUNT as usize], but I then I get the following error:

  --> src/main.rs:14:7
   |
14 |     ).collect::<[thread::JoinHandle<u64>; THREAD_COUNT as usize]>();
   |       ^^^^^^^ a collection of type `[std::thread::JoinHandle<u64>; 8]` cannot be built from `std::iter::Iterator<Item=std::thread::JoinHandle<u64>>`
   |
   = help: the trait `std::iter::FromIterator<std::thread::JoinHandle<u64>>` is not implemented for `[std::thread::JoinHandle<u64>; 8]`

Collecting into a vector does work, but that seems like a weird solution, because the size is known up front. Is there a better way then using a vector?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Jesse Maas
  • 51
  • 6
  • [How do I collect into an array?](https://stackoverflow.com/questions/26757355/how-do-i-collect-into-an-array) — TL;DR you cant easily. Use `ArrayVec` or just stick with a `Vec`, which will only do one allocation anyway since the length of the iterator is known. – Shepmaster Apr 03 '19 at 12:30

1 Answers1

7

Iterators in Rust are lazy, so your threads are not started until handles.fold tries to access the corresponding element of the iterator. Basically what happens is:

  1. handles.fold tries to access the first element of the iterator.
  2. The first thread is started.
  3. handles.fold calls its closure, which calls handle.join() for the first thread.
  4. handle.join waits for the first thread to finish.
  5. handles.fold tries to access the second element of the iterator.
  6. The second thread is started.
  7. and so on.

You should collect the handles into a vector before folding the result:

let handles: Vec<_> = (0..THREAD_COUNT)
    .map(|thread_id| {
        thread::spawn(move ||
            // calculate the sum of all numbers from assigned to this thread
            (thread_id * BATCH_SIZE + 1 .. (thread_id + 1) * BATCH_SIZE + 1)
                .fold(0_u64,|sum, number| sum + number))
    })
    .collect();

Or you could use a crate like Rayon which provides parallel iterators.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Jmb
  • 18,893
  • 2
  • 28
  • 55
  • Ah thanks. This is exactly what I thought. (I was writing the edit while you posted this.) I think I'll have a look at rayon. – Jesse Maas Apr 03 '19 at 09:19