What you're doing here is commonly known in databases as a join. As an SQL query, your code would look like this:
select * from jacket, shirt, tie where jacket.size == shirt.size and jacket.color == tie.color;
Algorithmic Ideas
Nested Loop Join
Now, what you've implemented is known as a nested loop join, which usually would have complexity O(jackets * shirts * ties), however, you have an early return inside your shirt-loop, so you save some complexity there and reduce it to O(jackets * shirts + jackets * matching_shirts * ties).
For small data sets, as the one you provided, this nested loop join can be very effective and is usually chosen. However, if the data gets bigger, the algorithm can quickly become slow. Depending on how much additional memory you can spare and whether sorting the input sequences is okay with you, you might want to use approaches that utilize sorting, as @Deduplicator initially pointed out.
Sort Merge Join
The sort-merge-join usually is used for two input sequences, so that after both sequences have been sorted, you only need to traverse each sequence once, giving you complexity of O(N log N) for the sorting and O(N) for the actual join phase. Check out the Wikipedia article for a more in-depth explanation. However, when you have more than two input sequences, the logic can become hard to maintain, and you have to traverse one of the input sequences more than once. Effectively, we will have O(N log N) for the sorting of jackets, shirts and ties and O(jackets + shirts + colors * ties) complexity for the actual joining part.
Sort + Binary Search Join
In the answer @Deduplicator gave, they utilized sorting, but in a different way: Instead of sequentially going through the input sequences, they used binary search to quickly find the block of matching elements. This gives O(N log N) for the sorting and O(jackets * log shirts + log ties * matching jacket-shirt combinations + output_elements) for the actual join phase.
Hash Join
However, all of these approaches can be trumped if the elements you have can easily be hashed, as hashmaps allow us to store and find potential join partners incredibly fast. Here, the approach is to iterate over all but one input sequence once and store the elements in a hashmap. Then, we iterate over the last input sequence once, and, using the hash map, find all matching join partners. Again, Wikipedia has a more in-depth explanation. We can utilize std::unordered_map
here. We should be able to find a good hashing function here, so this gives us a total runtime of O(jackets + shirts + ties + total_output_tuples), which is a lower bound on how fast you can be: You need to process all input sequences once, and you need to produce the output sequence.
Also, the code for it is rather beautiful (imho) and easy to understand:
void hashJoin(vector<Outfit> &outfits) {
using shirtsize_t = decltype(Shirt::size);
std::unordered_map<shirtsize_t, vector<shared_ptr<Shirt>>> shirts_by_sizes;
using tiecolor_t = decltype(Tie::color);
std::unordered_map<tiecolor_t, vector<shared_ptr<Tie>>> ties_by_colors;
for(const auto& shirt : shirts) {
shirts_by_sizes[shirt->size].push_back(shirt);
}
for(const auto& tie : ties) {
ties_by_colors[tie->color].push_back(tie);
}
for (const auto& jacket : jackets) {
for (const auto& matching_shirt : shirts_by_sizes[jacket->size]) {
for (const auto& matching_tie : ties_by_colors[jacket->color]) {
outfits.push_back(Outfit{jacket, matching_shirt, matching_tie});
}
}
}
}
However, in the worst case, if our hashing does not give us uniform hashes, we might experience worse complexity, as the O(1) access can become worse. You would want to inspect the hash maps to spot this, and replace the hash function in this case.
Implementations
I've posted a working implementation of all four algorithms I discussed here on godbolt. However, since they are rather long, I only included the (superior) hash join algorithm in this answer
Lower bounds and output elements
As @Dominique pointed out, there is no way to have a better run time complexity than O(output_element_count), since you have to produce each output element at least once. So, if you expect that your result is (asymptotically) close to jackets * shirts * ties, the nested loop join is the variant with the lowest overhead, and thus should be fastest. However, I think this will not be the case here.