I wrote a simple multithreaded application in Rust to add the numbers from 1 to x. (I know there is a formula for this, but the point was to write some multi threaded code in Rust, not to get the outcome.) It worked fine, but after I refactored it to a more functional style instead of imperative, there was no more speedup from multithreading. When inspecting the CPU usage, it appears that only one core is used of my 4 core / 8 thread CPU. The original code has 790% CPU usage, while the refactored version only has 99%.
The original code:
use std::thread;
fn main() {
let mut handles: Vec<thread::JoinHandle<u64>> = Vec::with_capacity(8);
const thread_count: u64 = 8;
const batch_size: u64 = 20000000;
for thread_id in 0..thread_count {
handles.push(thread::spawn(move || {
let mut sum = 0_u64;
for i in thread_id * batch_size + 1_u64..(thread_id + 1) * batch_size + 1_u64 {
sum += i;
}
sum
}));
}
let mut total_sum = 0_u64;
for handle in handles.into_iter() {
total_sum += handle.join().unwrap();
}
println!("{}", total_sum);
}
The refactored code:
use std::thread;
fn main() {
const THREAD_COUNT: u64 = 8;
const BATCH_SIZE: u64 = 20000000;
// spawn threads that calculate a part of the sum
let handles = (0..THREAD_COUNT).map(|thread_id| {
thread::spawn(move ||
// calculate the sum of all numbers from assigned to this thread
(thread_id * BATCH_SIZE + 1 .. (thread_id + 1) * BATCH_SIZE + 1)
.fold(0_u64,|sum, number| sum + number))
});
// add the parts of the sum together to get the total sum
let sum = handles.fold(0_u64, |sum, handle| sum + handle.join().unwrap());
println!("{}", sum);
}
the outputs of the programs are the same (12800000080000000), but the refactored version is 5-6 times slower.
It appears that iterators are lazily evaluated. How can I force the entire iterator to be evaluated? I tried to collect it into an array of type [thread::JoinHandle<u64>; THREAD_COUNT as usize]
, but I then I get the following error:
--> src/main.rs:14:7
|
14 | ).collect::<[thread::JoinHandle<u64>; THREAD_COUNT as usize]>();
| ^^^^^^^ a collection of type `[std::thread::JoinHandle<u64>; 8]` cannot be built from `std::iter::Iterator<Item=std::thread::JoinHandle<u64>>`
|
= help: the trait `std::iter::FromIterator<std::thread::JoinHandle<u64>>` is not implemented for `[std::thread::JoinHandle<u64>; 8]`
Collecting into a vector does work, but that seems like a weird solution, because the size is known up front. Is there a better way then using a vector?