11

I turn a regex into a HashSet after doing some filtering. I am trying to use it with Rayon, but I can't figure out how to make Rayon work with an existing iterator without converting it to a vector first. Is this possible?

let re = Regex::new("url=\"(?P<url>.+?)\"").unwrap();
let urls: HashSet<String> = re.captures_iter(&contents)
    .map(|m| Url::parse(m.name("url").unwrap().as_str()))
    .filter(|parsed_url| parsed_url.is_ok())
    .map(|parsed_url| parsed_url.unwrap())
    .filter(|parsed_url| parsed_url.has_host())
    .map(|parsed_url| parsed_url.into_string())
    .collect();
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
user964375
  • 2,201
  • 3
  • 26
  • 27
  • 2
    `par_iter` as stated in the doc? – Boiethios Feb 22 '18 at 08:41
  • 1
    I tried that but got the following error: error[E0599]: no method named `par_iter` found for type `regex::CaptureMatches<'_, '_>` in the current scope --> src/main.rs:24:58 | 24 | let urls: HashSet = re.captures_iter(&contents).par_iter() | ^^^^^^^^ | = note: the method `par_iter` exists but the following trait bounds were not satisfied: `regex::CaptureMatches<'_, '_> : rayon::iter::IntoParallelRefIterator` – user964375 Feb 22 '18 at 08:45

2 Answers2

15

This is possible now with ParallelBridge:

use rayon::iter::ParallelBridge;
use rayon::prelude::ParallelIterator;
use std::sync::mpsc::channel;

let rx = {
    let (tx, rx) = channel();

    tx.send("one!");
    tx.send("two!");
    tx.send("three!");

    rx
};

let mut output: Vec<&'static str> = rx.into_iter().par_bridge().collect();
output.sort_unstable();

assert_eq!(&*output, &["one!", "three!", "two!"]);
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Jesse Grosjean
  • 607
  • 4
  • 10
  • 1
    Note that you can use `rayon::prelude::*` to get traits such as `ParallelBridge` and `ParallelIterator` automatically. (It might be a matter of taste, though, but I find that naming the _traits_ manually doesn't improve readability of the module.) Also, rayon has [its own sort](https://docs.rs/rayon/1.3.1/rayon/slice/trait.ParallelSliceMut.html#method.par_sort_unstable) which you might want to use to sort the output by multiple cores. – user4815162342 Jul 26 '20 at 09:29
10

This answer is outdated for the last version of rayon. See the other answer for a possible solution. It may or may not apply to your usecase.


Minimal reproduction:

extern crate rayon;

use rayon::prelude::*;

fn main() {
    let v = vec![1_i32, 2, 3, 4].into_iter();

    // no method named `par_iter` found for type `std::vec::IntoIter<i32>`
    let _ = v.par_iter().sum();
}

You cannot do that. Here are all the implementors of this feature, that are:

  • BinaryHeap
  • BTreeMap
  • BTreeSet
  • HashMap
  • HashSet
  • LinkedList
  • VecDeque
  • Option
  • Range
  • Result
  • Slice/Array

I think that the reason why you cannot parallelize them is because iterators are lazy. An iterator is basically a current item Option<Item> and a next() method. You cannot split it in two parts to execute them in different threads.

Boiethios
  • 38,438
  • 19
  • 134
  • 183