I have &[&str]
and need to calculate frequencies of characters by using worker_count
threads.
This example is done for education purposes and have many alternative solutions. Some of them I found limiting for improving my understanding of Rust:
- Solution using channels
- Solution using
thread::scope
- requirement that threads will be forced joined, if had not done before. This should be done before exiting scope. - Solution using
rayon
orcrossbeam
libs, like in this post
My first iteration
use std::{collections::HashMap, sync::{Mutex, Arc}, thread};
pub fn frequency(input: &[&str], worker_count: usize) -> HashMap<char, usize> {
let input_len = input.len();
let acc = Arc::new(Mutex::new(HashMap::<char, usize>::new()));
let counter: Arc<Mutex<usize>> = Arc::new(Mutex::new(input_len));
let mut handles: Vec<thread::JoinHandle<()>> = vec![];
let input_vec = input.to_vec();
let shared_strings = Arc::new(input_vec);
for _ in 0..worker_count {
let counter = Arc::clone(&counter);
let acc: Arc<Mutex<HashMap<char, usize>>> = Arc::clone(&acc);
let shared_strings = Arc::clone(&shared_strings);
let handle = thread::spawn(move || {
let target_idx: usize;
{
let mut i = counter.lock().unwrap();
if *i == 0 {
return;
}
*i -= 1;
target_idx = *i;
}
let d = HashMap::<char, usize>::new();
{
let input_string = shared_strings[target_idx];
// populate d based on input string
// merge d and acc hashmaps together
}
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
*acc.lock().unwrap()
}
As of now, with code above I cannot pass input: &[&str]
into every thread cos Rust drops unclear message to me: input escapes the function body here
(Why input
is still relevant if I made a input_vec: Vec<&str>
that can outlive it?
borrowed data escapes outside of function
`input` escapes the function body here
borrowed data escapes outside of function
argument requires that `'1` must outlive `'static`
borrowed data escapes outside of function
argument requires that `'2` must outlive `'static`
EDIT 1
My first solution: pass &[&str]
by converting it to owned Vec<String>
pub fn frequency(input: &[&str], worker_count: usize) -> HashMap<char, usize> {
let input_len = input.len();
let acc = Arc::new(Mutex::new(HashMap::<char, usize>::new()));
let counter: Arc<Mutex<usize>> = Arc::new(Mutex::new(input_len));
let mut handles: Vec<thread::JoinHandle<()>> = vec![];
let vec_input: Vec<String> = input.iter().map(|s| s.to_string()).collect();
let shared_strings = Arc::new(vec_input);
for _ in 0..worker_count {
let counter = Arc::clone(&counter);
let acc: Arc<Mutex<HashMap<char, usize>>> = Arc::clone(&acc);
let shared_strings = Arc::clone(&shared_strings);
let handle = thread::spawn(move || {
loop {
let target_idx: usize;
{
let mut i = counter.lock().unwrap();
if *i == 0 {
return;
}
*i -= 1;
target_idx = *i;
}
let input_string = shared_strings.get(target_idx).unwrap();
// populate d based on input string
let mut d = HashMap::<char, usize>::new();
for c in input_string.chars() {
if !c.is_alphabetic() {
continue;
}
let key = c.to_ascii_lowercase();
let fq: &mut usize = d.entry(key).or_insert(0);
*fq += 1;
}
// merge d and acc hashmaps together
let mut acc = acc.lock().unwrap();
for (c, fq) in &d {
let fq_acc: &mut usize = acc.entry(*c).or_insert(0);
*fq_acc += fq;
}
}
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
let result = acc.lock().unwrap().clone();
result
}
This solution is raising a question if I can partially lock acc: HashMap<char, usize>
for specific keys I am currently editing to improve performance.