I have working code that gives the desired output but it's really slow (even in release mode) and eventually runs out of system memory with just 100 Pair
structs.
Below is only a sample of Pair
struct used, in production, it is more like 70,000..
fn main() {
let pair1 = Pair { pair_id: "abc1", p1: "x1", p2: "x2" };
let pair2 = Pair { pair_id: "abc2", p1: "x1", p2: "x3" };
let pair3 = Pair { pair_id: "abc3", p1: "x4", p2: "x2" };
let pair4 = Pair { pair_id: "abc4", p1: "x2", p2: "x1" };
let pair5 = Pair { pair_id: "abc5", p1: "x4", p2: "x1" };
let vec = vec![pair1, pair2, pair3, pair4, pair5];
let input = process(&vec);
let mut output= vec![vec![]];
let current_path = vec![];
traverse("x1", "x1", &input, current_path, &mut output);
println!("{:#?}", output);
}
#[derive(Debug, PartialEq)]
struct Pair<'a> {
pair_id: &'a str,
p1: &'a str,
p2: &'a str,
}
fn process<'a>(pairs: &'a Vec<Pair<'_>>) -> HashMap<&'a str, Vec<&'a Pair<'a>>> {
let mut res: HashMap<&'a str, Vec<&'a Pair<'a>>> = HashMap::new();
for pair in pairs {
res.entry(pair.p1).or_default().push(pair);
res.entry(pair.p2).or_default().push(pair);
}
res
}
fn traverse<'a>(start: &str, next: &str, input: &HashMap<&str, Vec<&'a Pair<'a>>>, current_path: Vec<&'a Pair<'a>>, output: &mut Vec<Vec<&'a Pair<'a>>>) {
if current_path.len() > 1 && start == next {
output.push(current_path);
} else {
let iter = input.get(next)
.unwrap()
.iter()
.filter(|e | !current_path.contains(e));
for i in iter {
let mut new_path = current_path.clone();
let next_pair = if i.p1 == next { i.p2 } else { i.p1 };
new_path.push(i);
traverse(start, next_pair, input, new_path, output);
}
}
}
One way to improve speed is to limit the depth of which the function traverses, but I am sure there are other ways I am missing.
Expected Output
A vector of pair connections starting from a p1 or p2 ID.
Example:
Below the starting ID is x1
. The function finds all of the Pair
s with ID x1
for either p1
or p2
.
Then using the opposite p1/p2 ID of that pair it searches all of the other pairs where that ID exists in either p1
or p2
.
The process repeats until a Pair
is found with the starting ID (x1
). Also the Pair
structs must be unique.
[[
Pair {
pair_id: "abc1",
p1: "x1",
p2: "x2",
},
Pair {
pair_id: "abc3",
p1: "x4",
p2: "x2",
},
Pair {
pair_id: "abc5",
p1: "x4",
p2: "x1",
},
]...]