1

On page 465 of Programming Rust you can find the code and explanation (emphasis added by me)

use std::sync::Arc;

fn process_files_in_parallel(filenames: Vec<String>,
                             glossary: Arc<GigabyteMap>)
    -> io::Result<()>
{
    ...
    for worklist in worklists {
        // This call to .clone() only clones the Arc and bumps the
        // reference count. It does not clone the GigabyteMap.
        let glossary_for_child = glossary.clone();
        thread_handles.push(
            spawn(move || process_files(worklist, &glossary_for_child))
        );
    }
    ...
}

We have changed the type of glossary: to run the analysis in parallel, the caller must pass in an Arc<GigabyteMap>, a smart pointer to a GigabyteMap that’s been moved into the heap, by doing Arc::new(giga_map). When we call glossary.clone(), we are making a copy of the Arc smart pointer, not the whole GigabyteMap. This amounts to incrementing a reference count. With this change, the program compiles and runs, because it no longer depends on reference lifetimes. As long as any thread owns an Arc<GigabyteMap>, it will keep the map alive, even if the parent thread bails out early. There won’t be any data races, because data in an Arc is immutable.

In the next section they show this rewritten with Rayon,

extern crate rayon;

use rayon::prelude::*;

fn process_files_in_parallel(filenames: Vec<String>, glossary: &GigabyteMap)
    -> io::Result<()>
{
    filenames.par_iter()
        .map(|filename| process_file(filename, glossary))
        .reduce_with(|r1, r2| {
            if r1.is_err() { r1 } else { r2 }
        })
        .unwrap_or(Ok(()))
}

You can see in the section rewritten to use Rayon that it accepts &GigabyteMap rather than Arc<GigabyteMap>. They don't explain how this works though. Why doesn't Rayon require Arc<GigabyteMap>? How does Rayon get away with accepting a direct reference?

trent
  • 25,033
  • 7
  • 51
  • 90
Evan Carroll
  • 78,363
  • 46
  • 261
  • 468
  • It looks like your question might be answered by the answers of [How can I pass a reference to a stack variable to a thread?](https://stackoverflow.com/q/32750829/155423). If not, please **[edit]** your question to explain the differences. Otherwise, we can mark this question as already answered. – Shepmaster Dec 23 '19 at 20:53

1 Answers1

3

Rayon can guarantee that the iterator does not outlive the current stack frame, unlike what I assume is thread::spawn in the first code example. Specifically, par_iter under the hood uses something like Rayon's scope function, which allows one to spawn a unit of work that's "attached" to the stack and will join before the stack ends.

Because Rayon can guarantee (via lifetime bounds, from the user's perspective) that the tasks/threads are joined before the function calling par_iter exits, it can provide this API which is more ergonomic to use than the standard library's thread::spawn.

Rayon expands on this in the scope function's documentation.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Mark Rousskov
  • 921
  • 12
  • 17
  • How can rayon make this guarantee when core Rust can not? Is Rayon internally using an Arc? – Evan Carroll Dec 26 '19 at 15:00
  • std could make this guarantee, but currently has no such function (leaving to libraries like rayon and crossbeam). Rayon guarantees that the thread joins before the scope function exits (whether due to unwinding or normal return); because of `mem::forget` this cannot be done with a RAII-based API in the return type. – Mark Rousskov Dec 26 '19 at 18:52