Why do I obtain different results for the Bron-Kerbosch algorithm when switching between a BTreeSet and a HashSet?

Question

I've been trying to implement the Bron-Kerbosch algorithm in Rust for my master's thesis. So far everything was working, but when I tried changing from a BTreeSet to a HashSet for performance comparison purposes, the behaviour became completely random (the results are at least).

I can't find any information about the order of the nodes having any influence on the result, yet, changing to an unordered collection breaks the results as the algorithm seems to miss some branches during the backtracking.

use std::collections::hash_map::Entry::{Occupied, Vacant};
use std::collections::{BTreeSet, HashMap};

type Nodes = BTreeSet<u32>;
type Graph = HashMap<u32, Nodes>;
type Record = (u32, u32);

fn init_nodes(records: &[Record]) -> Graph {
    let mut nodes: Graph = Graph::with_capacity(records.len());
    for record in records.iter() {
        let n: &mut Nodes = match nodes.entry(record.0) {
            Vacant(entry) => entry.insert(Nodes::new()),
            Occupied(entry) => entry.into_mut(),
        };
        n.insert(record.1);
        nodes.entry(record.1).or_insert_with(Nodes::new);
    }
    nodes.shrink_to_fit();
    nodes
}

fn bron1(graph: &Graph, r: Nodes, mut p: Nodes, mut x: Nodes, cliques: &mut Vec<Nodes>) {
    if p.is_empty() && x.is_empty() {
        cliques.push(r);
    } else if !p.is_empty() {
        let nodes = p.iter().cloned().collect::<Nodes>();
        nodes.iter().for_each(|node| {
            let neighbours: &Nodes = graph.get(node).unwrap();
            let mut to_add: Nodes = Nodes::new();
            to_add.insert(*node);
            bron1(
                graph,
                r.union(&to_add).cloned().collect(),
                p.intersection(&neighbours).cloned().collect(),
                x.intersection(&neighbours).cloned().collect(),
                cliques,
            );
            p.remove(node);
            x.insert(*node);
        });
    }
}

fn display_cliques(cliques: &[Nodes]) {
    let max = (&cliques[0]).len();
    let mut count = 0;
    for (idx, cl) in cliques.iter().enumerate() {
        if cl.len() != max {
            count = idx;
            break;
        }
    }
    println!(
        "Found {} cliques of {} nodes on a total of {} cliques",
        count,
        max,
        cliques.len()
    )
}

fn main() {
    let records: Vec<Record> = vec![
        (1, 88160),
        (1, 118_052),
        (1, 161_555),
        (1, 244_916),
        (1, 346_495),
        (1, 444_232),
        (1, 447_165),
        (1, 500_600),
        (2, 27133),
        (2, 62291),
        (2, 170_507),
        (2, 299_250),
        (2, 326_776),
        (2, 331_042),
        (2, 411_179),
        (2, 451_149),
        (2, 454_888),
        (4, 16050),
        (4, 286_286),
        (4, 310_803),
        (4, 320_519),
        (4, 408_108),
        (4, 448_284),
        (5, 173_362),
    ];

    let nodes = init_nodes(&records);
    let r: Nodes = nodes.keys().copied().collect();
    let mut cliques: Vec<Nodes> = Vec::new();
    bron1(&nodes, Nodes::new(), r, Nodes::new(), &mut cliques);
    cliques.sort_unstable_by(|a, b| a.len().cmp(&b.len()).reverse());
    display_cliques(&cliques);
}

Playground

Running the code using BTreeSet gives the right results.

Found 24 cliques of 2 nodes on a total of 48 cliques

Changing the Nodes type to HashSet produces results that are completely different.

Found 5 cliques of 2 nodes on a total of 29 cliques

"Also, the results are random, leading me to believe that it is somewhat related to the random seed chosen by the HashSet (which shouldn't change the behaviour anyway)." yes please read documentation https://doc.rust-lang.org/std/collections/struct.HashMap.html — Stargateur, Mar 24 '20 at 18:59
Thanks @Stargateur, your small dataset was enough to recreate the issue. As for the error clippy warned about, it does not appear on my machine (**clippy 0.0.212 (4ee12063 2020-02-01)**), but I changed the code to make the warning disappear anyway. — Heychsea, Mar 24 '20 at 20:15
@Heychsea I do think that this question is suitable for Stack Overflow now — there's no need to delete it. If you have solved it yourself, you are encouraged to answer it below to prevent other people from struggling with the same thing. — Shepmaster, Mar 25 '20 at 00:26
Well, I have not resolved it, however I think I made progress and I just hid the question until I updated it — Heychsea, Mar 25 '20 at 09:29
For experimentation purposes, you could copy it out of the set and into a `Vec` then [shuffle it](https://stackoverflow.com/questions/26033976/how-do-i-create-a-vec-from-a-range-and-shuffle-it). That would allow you to see if randomness is truly at fault. However, the `HashSet` should have a different order *every run* as the seed changes, so just running your original code many times should have different results if that's true. — Shepmaster, Mar 25 '20 at 14:09
That is already the case, running the code using `HashSet`s gives different results every time — Heychsea, Mar 25 '20 at 14:11
So then it seems it's not so much a matter of `BTreeSet` vs `HashSet`, or even one pertaining to Rust, but purely an algorithmic one? — Shepmaster, Mar 25 '20 at 16:52
That might be the case indeed, thus the other tags. I might be wrong again but this is my latest hypothesis. — Heychsea, Mar 25 '20 at 16:54
I wonder if the problem is that the algorithm assumes that the order of iteration will be consistent. If so, you could try [using an alternative hashing algorithm](https://stackoverflow.com/a/45896032/155423) and/or share the random seed between each newly created set. That way they'd change each program run, but be consistent within one run. — Shepmaster, Mar 25 '20 at 16:55
As seen in [this playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=278ccf1676ed9dc921e258e9ba7876ec), this does not seem to be the issue. It indeed becomes stable but the results are still different. — Heychsea, Mar 25 '20 at 17:17

score 2 · Accepted Answer · answered Apr 01 '20 at 12:57

The order does not matter, and the program should work no matter if it uses HashSets or BTreeSets.

The init_nodes function is incorrect because the Bron-Kerbosch algorithm works on undirected graphs, yet, the init_nodes function does not register the edges both ways, which makes the graph directed and results in the fact that the order matters.

Here is a correct implementation of the function:

fn init_nodes(records: &[Record]) -> Graph {
    let mut nodes: Graph = Graph::with_capacity(records.len());
    for r in records.iter() {
        let n: &mut Nodes = match nodes.entry(r.0) {
            Vacant(entry) => entry.insert(Nodes::new()),
            Occupied(entry) => entry.into_mut(),
        };
        n.insert(r.1);
        let n: &mut Nodes = match nodes.entry(r.1) {
            Vacant(entry) => entry.insert(Nodes::new()),
            Occupied(entry) => entry.into_mut(),
        };
        n.insert(r.0);
    }
    nodes.shrink_to_fit();
    nodes
}

Why do I obtain different results for the Bron-Kerbosch algorithm when switching between a BTreeSet and a HashSet?

1 Answers1