5

I need to find out how many integers are present in both of two given sets, fast. The sets are written to only once but this operation will be performed many times with different pairs of sets. The sets contain 5-30 integers and the largest of these integers is 840000.

I have initially tried to iterate over one Vec and for each element check if its present in the other Vec. I then decided to use BTreeSet instead since it should be significantly faster at checking if an integer is present in the set, but that does not seem to be the case. The Vec implementation takes ~72ms and the BTreeSet ~96ms when called on couple thousands of sets in release mode under stable Rust 1.34 with same performance when using nightly.

This is the Vec implementation:

use std::cmp;

fn main() {
    let mut sets = Vec::with_capacity(1000);
    for i in 1..1000 {
        let mut set = Vec::new();
        for j in 1..i % 30 {
            set.push(i * j % 50000);
        }
        sets.push(set);
    }
    for left_set in sets.iter() {
        for right_set in sets.iter() {
            calculate_waste(left_set, right_set);
        }
    }
}

fn calculate_waste(left_nums: &Vec<usize>, right_nums: &Vec<usize>) -> usize {
    let common_nums = left_nums.iter().fold(0, |intersection_count, num| {
        intersection_count + right_nums.contains(num) as usize
    });
    let left_side = left_nums.len() - common_nums;
    let right_side = right_nums.len() - common_nums;
    let score = cmp::min(common_nums, cmp::min(left_side, right_side));
    left_side - score + right_side - score + common_nums - score
}

And this is the BTreeSet implementation:

use std::cmp;
use std::collections::BTreeSet;

fn main() {
    let mut sets = Vec::with_capacity(1000);
    for i in 1..1000 {
        let mut set = BTreeSet::new();
        for j in 1..i % 30 {
            set.insert(i * j % 50000);
        }
        sets.push(set);
    }
    for left_set in sets.iter() {
        for right_set in sets.iter() {
            calculate_waste(left_set, right_set);
        }
    }
}

fn calculate_waste(left_nums: &BTreeSet<usize>, right_nums: &BTreeSet<usize>) -> usize {
    let common_nums = left_nums.intersection(&right_nums).count();
    let left_side = left_nums.len() - common_nums;
    let right_side = right_nums.len() - common_nums;
    let score = cmp::min(common_nums, cmp::min(left_side, right_side));
    left_side - score + right_side - score + common_nums - score
}

It was ran with the command (-w 50 makes it ignore the first 50 runs):

hyperfine "cargo run --release" -w 50 -m 100

Full code of the program available here.

Is the BTreeSet implementation slower because there are too few integers in the set to allow its O(log n) access time to shine? If so, is there anything else I can do to speed up this function?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
HelloImRandom
  • 97
  • 2
  • 6
  • 1
    [Why is it discouraged to accept a reference to a String (&String), Vec (&Vec), or Box (&Box) as a function argument?](https://stackoverflow.com/q/40006219/155423) – Shepmaster May 22 '19 at 16:49
  • 1
    Your question might be answered by the answers of [Python set intersection is faster than Rust HashSet intersection](https://stackoverflow.com/q/35439376/155423) (specifically [this answer](https://stackoverflow.com/a/53841407/155423)). – Shepmaster May 22 '19 at 16:50
  • 1
    For your problem beyond this question, you may want to look into using a bit set which I'd expect to beat the performance of both. – Shepmaster May 22 '19 at 18:47
  • 1
    For the purposes of comparison, do the four lines starting with `let left_side` do anything? Couldn't you *just* calculate the intersection? – Shepmaster May 22 '19 at 19:29
  • 1
    How does the benchmark software ignore the difference in time for constructing the `Vec` and the `BTreeSet`? – Shepmaster May 22 '19 at 19:32
  • 3
    I think your sets are indeed too small to gain anything from a `BTreeSet`. The "B" in the Rust implementation is 6, so up to 11 numbers are stored in a single node. Each node is scanned linearly, so there is hardly a chance to gain anything over a vector. Regardless of the set size, a parallel scan on sorted vectors will always beat the `BTreeSet`, since it is `O(m + n)`, where `m` and `n` are the sizes of the two vectors. – Sven Marnach May 22 '19 at 19:50
  • Unfortunately it does not ignore this difference, but I have tried constructing them in different ways (eg `Vec:with_capacity()`) and the time difference was negligible. And yes, I could just calculate the intersection. I left that code in in case someone was able to come up with a way to improve the performance of this bit as well. – HelloImRandom May 22 '19 at 20:01
  • 2
    [Benchmarking the core algorithm via Criterion.rs](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=790608fec0f57dadbfa92bf90160a88a) yields `vec time: [61.092 ms 61.882 ms 62.686 ms]` and `set time: [91.517 ms 92.385 ms 93.441 ms]`, so apparently the creation overhead isn't solely to blame. – Shepmaster May 22 '19 at 20:03

1 Answers1

5

Since your sets don't change over time, I think your best option is to use sorted vectors. Sorting the vectors will be required only once, at initialization time. The intersection of two sorted vectors can be computed in linear time by iterating over them simultaneously, always advancing the iterator that currently points to the lower number. Here is an attempt at an implementation:

fn intersection_count_sorted_vec(a: &[u32], b: &[u32]) -> usize {
    let mut count = 0;
    let mut b_iter = b.iter();
    if let Some(mut current_b) = b_iter.next() {
        for current_a in a {
            while current_b < current_a {
                current_b = match b_iter.next() {
                    Some(current_b) => current_b,
                    None => return count,
                };
            }
            if current_a == current_b {
                count += 1;
            }
        }
    }
    count
}

This probably isn't particularly well optimised; regardless, benchmarking with Criterion-based code indicates this version is more than three times as fast as your solution using vectors.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • 1
    I note that another change which was made here is using `u32` instead of `usize` => Memory matters! If it fits in a u32, no point in wasting half the space using a `usize`! – Matthieu M. May 23 '19 at 07:11
  • @MatthieuM. Yeah, the OP mentioned that the maximum is 840000, so I sliently made that change. I expected it to have some performance impact as well due to memory bandwidth, but on my machine the difference was just 1%, so I didn't explicitly mention it in the answer. – Sven Marnach May 23 '19 at 07:13
  • I have tried both BitSet and this method and while BitSet yielded even greater performance benefits, memory overhead was too large in some cases. This method achieves similar performance while using negligible amounts of memory. – HelloImRandom May 27 '19 at 00:04