-1

Take this simple example where we're using an immutable list of vectors to calculate new values.

Given this working, single threaded example:

use std::collections::LinkedList;

fn calculate_vec(v: &Vec<i32>) -> i32 {
    let mut result: i32 = 0;
    for i in v {
        result += *i;
    }
    return result;
}

fn calculate_from_list(list: &LinkedList<Vec<i32>>) -> LinkedList<i32> {
    let mut result: LinkedList<i32> = LinkedList::new();
    for v in list {
        result.push_back(calculate_vec(v));
    }
    return result;
}

fn main() {
    let mut list: LinkedList<Vec<i32>> = LinkedList::new();
    // some arbitrary values
    list.push_back(vec![0, -2, 3]);
    list.push_back(vec![3, -4, 3]);
    list.push_back(vec![7, -10, 6]);

    let result = calculate_from_list(&list);
    println!("Here's the result!");
    for i in result {
        println!("{}", i);
    }
}

Assuming calculate_vec is a processor intensive function, we may want to use multiple threads to run this, the following example works but requires (what I think is) an unnecessary vector clone.

use std::collections::LinkedList;

fn calculate_vec(v: &Vec<i32>) -> i32 {
    let mut result: i32 = 0;
    for i in v {
        result += *i;
    }
    return result;
}

fn calculate_from_list(list: &LinkedList<Vec<i32>>) -> LinkedList<i32> {
    use std::thread;
    let mut result: LinkedList<i32> = LinkedList::new();
    let mut join_handles = LinkedList::new();
    for v in list {
        let v_clone = v.clone(); // <-- how to avoid this clone?
        join_handles.push_back(thread::spawn(move || calculate_vec(&v_clone)));
    }
    for j in join_handles {
        result.push_back(j.join().unwrap());
    }
    return result;
}

fn main() {
    let mut list: LinkedList<Vec<i32>> = LinkedList::new();
    // some arbitrary values
    list.push_back(vec![0, -2, 3]);
    list.push_back(vec![3, -4, 3]);
    list.push_back(vec![7, -10, 6]);

    let result = calculate_from_list(&list);
    println!("Here's the result!");
    for i in result {
        println!("{}", i);
    }
}

This example works, but it only when the vector is cloned, however logically, I don't think this should be needed since the vector is immutable.

There is no reason each call to calculate_vec should need to allocate a new vector.

How could this simple example be multi-threaded without needing to clone the data before its passed to the closure?


Update, heres a working example that uses Arc based on @ker's suggestion, although it does need to take ownership.


Note 1) I'm aware there are 3rd party libraries to handle threading, but would be interested to know if this is possible using Rust's standard library.

Note 2) There are quite a few similar questions on threading but examples often involves threads writing to data, which isn't the case here.

ideasman42
  • 42,413
  • 44
  • 197
  • 320
  • Not sure this is a duplicate, its definitely related, but this questions isn't concerned with lifetimes specifically - just in how to avoid duplicating read-only data. – ideasman42 Sep 13 '16 at 06:51
  • Your non-cloning code would have yielded exactly that error, so I think it's a duplicate, because the answer to the other question is basically the same as the one to this question. It's ok to have different questions for the same answers, but then the question is a duplicate in my opinion. I might be misinterpreting the stackoverflow rules though. – oli_obk Sep 13 '16 at 07:48
  • The intent may be important here, with this question the data is read-only and cloning the data may incur an undesirable cost. One of the points in the answer for example - suggests to copy the data. I'm not asking how to avoid the error, I'm asking how to avoid the `clone`. – ideasman42 Sep 13 '16 at 08:03
  • I'm probably thinking in circles... But avoiding a `clone` is done in Rust by using a reference, an `Rc` or an `Arc`. Which would then have the answer of the linked question. – oli_obk Sep 13 '16 at 08:46

1 Answers1

1

There are multiple ways to solve your problem.

  1. move the Vector into an Arc<LinkedList<Vec<i32>>> and clone that. After the calculation, you can use try_unwrap to get your LinkedList<Vec<i32>> back. This works with just the Rust standard library.

    Heres a working example that uses Arc, though LinkedList was replaced by Vec to allow indexing.
    Also note that the function needs to own the argument being passed to it in this case.

  2. Use the crossbeam crate to create threads that can reference their scope, freeing you from the need to do all that join_handles code by hand. This will have a minimal impact on your code, since it work exactly like you want.

    crossbeam::scope(|scope| {
        for v in list {
            scope.spawn(|| calculate_vec(&v))
        }
    });
    
  3. Use the scoped_threadpool crate. It works just like crossbeam but doesn't create one thread per task, instead it spreads out the tasks over a limited number of threads. (thanks @delnan)

  4. use the rayon crate for direct data parallelism

    use rayon::prelude::*;
    list.par_iter().map(|v| calculate_vec(&v)).collect()
    
oli_obk
  • 28,729
  • 6
  • 82
  • 98
  • 1
    Good answer! For those learning Rust, it might be helpful to provide a little info on the reason the `clone` OP is trying to get rid of is necessary in the first place. In other words, what rule of Rust requires the `clone` or the solutions you present? – Jimmy Sep 13 '16 at 06:49
  • While a great idea in general, I think that's out of scope for an answer to this exact question. If the clone is left out, then there are already multiple questions and answers related to references and threads. But it would certainly be good to add that reasoning to the question. – oli_obk Sep 13 '16 at 07:50
  • 1
    RE: #2 - `crossbeam` will launch an honest to god thread for every `v`, for large data sets it's usually better to use a thread pool such as the `scoped_threadpool` crate. –  Sep 13 '16 at 08:01
  • Added working example using `Arc`, hopefully this is in keeping with the answer. – ideasman42 Sep 13 '16 at 08:33