I'm trying to get started with Rust threads. In my example (contrived but based on a real problem), I want to accept a read-only HashMap
as an argument to a function and then supply it to a number of threads which each act read from partitions of it.
use std::{
collections::HashMap,
sync::{mpsc::channel, Arc},
thread,
};
const THREADS: u32 = 10;
// Concurrently add the lengths of values.
pub fn concurrent_lens(inputs: &HashMap<u32, String>) -> usize {
let inputs_arc = Arc::new(inputs);
let (tx, rx) = channel();
// Count length of all strings in parallel.
// Each thread takes a partition of the data.
for thread_i in 0..THREADS {
let tx = tx.clone();
let inputs_clone = inputs_arc.clone();
thread::spawn(move || {
for (i, content) in inputs_clone.iter() {
// Only look at my partition's keys.
if (i % THREADS) == thread_i {
// Something expensive with the string.
let expensive_operation_result = content.len();
tx.send(expensive_operation_result).unwrap();
}
}
});
}
// Join and sum results.
let mut result = 0;
for len in rx.iter() {
result += len;
}
result
}
However, the compiler says:
error[E0621]: explicit lifetime required in the type of `inputs`
--> src/main.rs:21:9
|
10 | pub fn concurrent_lens(inputs: &HashMap<u32, String>) -> usize {
| ------ consider changing the type of `inputs` to `&'static std::collections::HashMap<u32, std::string::String>`
...
21 | thread::spawn(move || {
| ^^^^^^^^^^^^^ lifetime `'static` required
My options are, as I understand:
- Make
inputs
static. This isn't possible, as it's not static data. - Let the function take ownership of
input
(not take a ref). So my function would bepub fn concurrent_lens(inputs: HashMap<u32, String>) -> usize
. This makes the compiler happy about its lifetime, but the data lives outside the function, and has a longer lifetime outside. - Ditto, but pass in a copy. Not ideal, it's a lot of data.
- Let the function take an Arc as an argument, i.e.
pub fn concurrent_lens(inputs: Arc<HashMap<u32, String>>) -> usize
. This works fine, but seems like a really leaky abstraction, as the calling code shouldn't have to know that it's calling a function that uses concurrency.
None of these seems quite right. Am I missing something?