0

I have a struct in Rust representing a container for a heavy data array. I need a method to perform some calculations over this data:

pub struct MultithreadTest {
    data: Vec<usize>,
}

impl MultithreadTest {
    const WORKERS_COUNT: usize = 2;

    pub fn new(data: Vec<usize>) -> Self {
        Self { data }
    }

    pub fn calculate(&self) {
        let data = std::sync::Arc::new(&self.data);

        let (tx, rx) = std::sync::mpsc::channel();

        (0..Self::WORKERS_COUNT).for_each(|thread_index| {
            let data = data.clone();
            let tx = tx.clone();

            std::thread::spawn(move || {
                // Do some stuff with data
                let sum: usize = data.iter().skip(thread_index).sum();

                tx.send(sum).unwrap();
            });
        });

        let result: usize = rx.iter().take(Self::WORKERS_COUNT).sum();

        println!("{:}", result);
    }
}

If I try to provide self by reference, Rust stops me:

error[E0495]: cannot infer an appropriate lifetime for borrow expression due to conflicting requirements
  --> src/lib.rs:13:40
   |
13 |         let data = std::sync::Arc::new(&self.data);
   |                                        ^^^^^^^^^^
   |
note: first, the lifetime cannot outlive the anonymous lifetime #1 defined on the method body at 12:5...
  --> src/lib.rs:12:5
   |
12 | /     pub fn calculate(&self) {
13 | |         let data = std::sync::Arc::new(&self.data);
14 | |
15 | |         let (tx, rx) = std::sync::mpsc::channel();
...  |
31 | |         println!("{:}", result);
32 | |     }
   | |_____^
note: ...so that reference does not outlive borrowed content
  --> src/lib.rs:13:40
   |
13 |         let data = std::sync::Arc::new(&self.data);
   |                                        ^^^^^^^^^^
   = note: but, the lifetime must be valid for the static lifetime...
note: ...so that the type `[closure@src/lib.rs:21:32: 26:14 data:std::sync::Arc<&std::vec::Vec<usize>>, thread_index:usize, tx:std::sync::mpsc::Sender<usize>]` will meet its required lifetime bounds
  --> src/lib.rs:21:13
   |
21 |             std::thread::spawn(move || {
   |             ^^^^^^^^^^^^^^^^^^```

I totally understand where it's coming from, but I can't understand how to solve it.

I can't clone `data` because it might be huge and I don't want to consume `self` by value.
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Dmitry
  • 1,426
  • 5
  • 11
  • It's not clear what you're trying to achieve, are you just trying to perform a computation in a multithreaded manner, or is this something more complicated? – Masklinn Aug 25 '20 at 12:22
  • 1
    The fundamental problem is that the stdlib threads are not *scoped* (and you're not joining them anyway), so the compiler can not know that `self` is valid for the thread's lifetime, as far as it's concerned the thread has an unbounded lifetime and thus can either use `'static` data or *own* the data it needs, borrows are not acceptable, even if they're thread-safe. – Masklinn Aug 25 '20 at 12:25
  • 2
    Now one solution is to use crossbeam's scoped threads (and joining the threads properly), this way you can borrow for that duration. However for the example here you could just use Rayon's parallel iterator, performing this sort of operations / accumulations is its bread and butter. – Masklinn Aug 25 '20 at 12:27
  • 1
    Incidentally, note that using channels is complete overkill here: Rust threads can *return values*, the "thread owner" retrieves the value (or the error if the thread failed) from the `join` method on the handle. Channels are for cases where you have multiple consumers or threads need to produce multiple values, which is not the case here. – Masklinn Aug 25 '20 at 12:28
  • I just want some heavy computations over the different parts of `data` for many threads. The example above is very simplistic. Each thread should take some part of data and make computations (some times parts will overlap, so I cant just split `data` on many `Vec`s parts). In the real task the `data` isn't even a simple `Vec`. My problem is that I don't know how to tell to Rust that I am really sure that `self` and data inside it will outlive these computations. – Dmitry Aug 25 '20 at 12:37
  • with the std threads you can't, they can't express "this thread terminates within the execution of this method" at all. Crossbeam's scoped threads can though. I think you could also use rayon's `par_chunks`, which however large you want each chunk to be (each thread would then process an entire chunk at once, you can have more chunks than thread of course rayon will just feed new work to however many threads it decides to spawn behind the scene). – Masklinn Aug 25 '20 at 13:08

1 Answers1

1

Thanks to Masklinn's replies, I found a working example using the crossbeam crate:

pub struct MultithreadTest {
    data: Vec<usize>,
}

impl MultithreadTest {
    const WORKERS_COUNT: usize = 2;

    pub fn new(data: Vec<usize>) -> Self {
        Self { data }
    }

    pub fn calculate(&self) {
        crossbeam::scope(|scope| {
            let workers: Vec<_> = (0..Self::WORKERS_COUNT)
                .map(|thread_index| {
                    scope.spawn(move || {
                        // Do some stuff with data
                        let sum: usize = self.data.iter().skip(thread_index).sum();

                        sum
                    })
                })
                .collect();

            let result: usize = workers.into_iter().map(|w| w.join()).sum();
            println!("{:}", result);
        });
    }
}
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Dmitry
  • 1,426
  • 5
  • 11
  • You can also use the standard library's read/write lock and Arc. For example https://gist.github.com/prithvisingh18/993b185b648ac5faeed24d141ed642a3 – Prithvi Singh Jun 19 '23 at 09:04