How to optimize this to output all possible cycles in a tree data structure

Question

I have working code that gives the desired output but it's really slow (even in release mode) and eventually runs out of system memory with just 100 Pair structs.

Below is only a sample of Pair struct used, in production, it is more like 70,000..


fn main() {
    let pair1 = Pair { pair_id: "abc1", p1: "x1", p2: "x2" };
    let pair2 = Pair { pair_id: "abc2", p1: "x1", p2: "x3" };
    let pair3 = Pair { pair_id: "abc3", p1: "x4", p2: "x2" };
    let pair4 = Pair { pair_id: "abc4", p1: "x2", p2: "x1" };
    let pair5 = Pair { pair_id: "abc5", p1: "x4", p2: "x1" };
    let vec = vec![pair1, pair2, pair3, pair4, pair5];
    let input = process(&vec);
    let mut output= vec![vec![]];
    let current_path = vec![];
    traverse("x1", "x1", &input, current_path, &mut output);
    println!("{:#?}", output);
}

#[derive(Debug, PartialEq)]
struct Pair<'a> {
    pair_id: &'a str,
    p1: &'a str,
    p2: &'a str,
}

fn process<'a>(pairs: &'a Vec<Pair<'_>>) -> HashMap<&'a str, Vec<&'a Pair<'a>>> {
    let mut res: HashMap<&'a str, Vec<&'a Pair<'a>>> = HashMap::new();
    for pair in pairs {
        res.entry(pair.p1).or_default().push(pair);
        res.entry(pair.p2).or_default().push(pair);
    }
    res
}

fn traverse<'a>(start: &str, next: &str, input: &HashMap<&str, Vec<&'a Pair<'a>>>, current_path: Vec<&'a Pair<'a>>, output: &mut Vec<Vec<&'a Pair<'a>>>) {
    if current_path.len() > 1 && start == next {
        output.push(current_path);
    } else {
        let iter = input.get(next)
            .unwrap()
            .iter()
            .filter(|e | !current_path.contains(e));
        for i in iter {
            let mut new_path = current_path.clone();
            let next_pair = if i.p1 == next { i.p2 } else { i.p1 };
            new_path.push(i);
            traverse(start, next_pair, input, new_path, output);
        }
    }
}

Playground

One way to improve speed is to limit the depth of which the function traverses, but I am sure there are other ways I am missing.

Expected Output

A vector of pair connections starting from a p1 or p2 ID.

Example:

Below the starting ID is x1. The function finds all of the Pairs with ID x1 for either p1 or p2.

Then using the opposite p1/p2 ID of that pair it searches all of the other pairs where that ID exists in either p1 or p2.

The process repeats until a Pair is found with the starting ID (x1). Also the Pair structs must be unique.

[[
  Pair {
    pair_id: "abc1",
    p1: "x1",
    p2: "x2",
  },
  Pair {
    pair_id: "abc3",
    p1: "x4",
    p2: "x2",
  },
  Pair {
    pair_id: "abc5",
    p1: "x4",
    p2: "x1",
  },
]...]

@Aiden4 To output an ordered vector of connecting Pairs based on a starting p1 or p2 ID — James, Apr 08 '22 at 04:53
You do realize that the number of solutions is _huge_. I'm not surprised that you run out of memory, if anything I'm surprised that you were able to get to 100 pairs before running out… — Jmb, Apr 08 '22 at 06:28
Does this answer your question? [Find all paths between two graph nodes](https://stackoverflow.com/questions/9535819/find-all-paths-between-two-graph-nodes) — hkBst, Apr 08 '22 at 07:33
@Jmb would you say putting a limit on the depth would be the way to go? Thanks for your input. — James, Apr 08 '22 at 09:56
@hkBst I was playing around with petgraph today which has that algorithm. I will check it out some more. Thanks — James, Apr 08 '22 at 09:56
Wouldn't it be better to just collect all edges that can lead from source to destination? The amount of memory is then guaranteed to be finite and I can imagine how the algorithm would work. — Jakub Dóka, Apr 08 '22 at 10:55
@JakubDóka what do you edges look like? A vec of pair_ids? Thanks — James, Apr 08 '22 at 11:20
Putting a limit on the depth means you will not get all the possible paths. So what do you really need? Any single path? The shortest path? A reasonably short path even if it's not the absolute shortest? A few alternative paths? — Jmb, Apr 08 '22 at 12:26
@Jmb ideally all paths, but if that is going to not work due to memory constraints then I guess a depth limit would be needed, something like all paths up to 5 nodes with start and end nodes having the starting ID in both. I really feel the memory constraints are due to the recursive function calls rather than the output vector itself. — James, Apr 08 '22 at 12:45
When in doubt, measure. Can you print `output.iter().map (|v| v.len()).sum::()` and `output.len()` for the largest `Pair` count that works? — Jmb, Apr 08 '22 at 14:42
@James please can you share sample of 1000 pairs of graph edges — Chandan, Apr 11 '22 at 14:57

How to optimize this to output all possible cycles in a tree data structure

0 Answers0