0

I have &[&str] and need to calculate frequencies of characters by using worker_count threads.

This example is done for education purposes and have many alternative solutions. Some of them I found limiting for improving my understanding of Rust:

  1. Solution using channels
  2. Solution using thread::scope - requirement that threads will be forced joined, if had not done before. This should be done before exiting scope.
  3. Solution using rayon or crossbeam libs, like in this post

My first iteration

use std::{collections::HashMap, sync::{Mutex, Arc}, thread};

pub fn frequency(input: &[&str], worker_count: usize) -> HashMap<char, usize> {
    let input_len = input.len();
    let acc = Arc::new(Mutex::new(HashMap::<char, usize>::new()));
    let counter: Arc<Mutex<usize>> = Arc::new(Mutex::new(input_len));
    let mut handles: Vec<thread::JoinHandle<()>> = vec![];
    let input_vec = input.to_vec();
    let shared_strings =  Arc::new(input_vec);
    for _ in 0..worker_count {
        let counter = Arc::clone(&counter);
        let acc: Arc<Mutex<HashMap<char, usize>>> = Arc::clone(&acc);
        let shared_strings = Arc::clone(&shared_strings);
        let handle = thread::spawn(move || {
            let target_idx: usize;
            {
                let mut i = counter.lock().unwrap();
                if *i == 0 {
                    return;
                }
                *i -= 1;
                target_idx = *i;
            }
            let d = HashMap::<char, usize>::new();
            {
                let input_string = shared_strings[target_idx];
                // populate d based on input string
                // merge d and acc hashmaps together
            }
            
        });
        handles.push(handle);
    }
    for handle in handles {
        handle.join().unwrap();
    }
    *acc.lock().unwrap()
}

As of now, with code above I cannot pass input: &[&str] into every thread cos Rust drops unclear message to me: input escapes the function body here (Why input is still relevant if I made a input_vec: Vec<&str> that can outlive it?

borrowed data escapes outside of function
`input` escapes the function body here

borrowed data escapes outside of function
argument requires that `'1` must outlive `'static`

borrowed data escapes outside of function
argument requires that `'2` must outlive `'static`

EDIT 1

My first solution: pass &[&str] by converting it to owned Vec<String>

pub fn frequency(input: &[&str], worker_count: usize) -> HashMap<char, usize> {
    let input_len = input.len();
    let acc = Arc::new(Mutex::new(HashMap::<char, usize>::new()));
    let counter: Arc<Mutex<usize>> = Arc::new(Mutex::new(input_len));
    let mut handles: Vec<thread::JoinHandle<()>> = vec![];
    let vec_input: Vec<String> = input.iter().map(|s| s.to_string()).collect();
    let shared_strings =  Arc::new(vec_input);
    for _ in 0..worker_count {
        let counter = Arc::clone(&counter);
        let acc: Arc<Mutex<HashMap<char, usize>>> = Arc::clone(&acc);
        let shared_strings = Arc::clone(&shared_strings);
   
        let handle = thread::spawn(move || {
            loop {
                let target_idx: usize;
                {
                    let mut i = counter.lock().unwrap();
                    if *i == 0 {
                        return;
                    }
                    *i -= 1;
                    target_idx = *i;
                }

                let input_string = shared_strings.get(target_idx).unwrap();
                
                // populate d based on input string
                let mut d = HashMap::<char, usize>::new();
                for c in input_string.chars() {
                    if !c.is_alphabetic() {
                        continue;
                    }
                    let key = c.to_ascii_lowercase();
                    let fq: &mut usize = d.entry(key).or_insert(0);
                    *fq += 1;
                }

                // merge d and acc hashmaps together
                let mut acc = acc.lock().unwrap();
                for (c, fq) in &d {
                    let fq_acc: &mut usize = acc.entry(*c).or_insert(0);
                    *fq_acc += fq;
                }
            }
        });
        handles.push(handle);
    }
    for handle in handles {
        handle.join().unwrap();
    }

    let result = acc.lock().unwrap().clone();
    result
}

This solution is raising a question if I can partially lock acc: HashMap<char, usize> for specific keys I am currently editing to improve performance.

Ievgen
  • 1,999
  • 2
  • 12
  • 24
  • 1
    `input` is still relevant because the `&str` in `input_vec` still refer to the original `input`. You can fix the issue using [scoped threads](https://doc.rust-lang.org/stable/std/thread/fn.scope.html) (at which point you no longer need `input_vec` or the `Arc`s since you can pass the references directly to the threads). – Jmb Aug 01 '23 at 06:38

0 Answers0