4

I have two sets (or maps) and need to efficiently handle their intersection. I know that there are two ways of doing this:

  • iterate over both maps as in std::set_intersection: O(n1+n2)
  • iterating over one map and finding elements in the other: O(n1*log(n2))

Depending on the sizes either of these two solution is significantly better (have timed it), and I thus need to either switch between these algorithm based on the sizes (which is a bit messy) - or find a solution outperforming both, e.g. using some variant of map.find() taking the previous iterator as a hint (similarly as map.emplace_hint(...)) - but I could not find such a function.

Question: Is it possible to combine the performance characteristics of the two solutions directly using STL - or some compatible library?

Note that the performance requirement makes this different from earlier questions such as Efficient intersection of sets?

NuPagadi
  • 1,410
  • 1
  • 11
  • 31
Hans Olsson
  • 11,123
  • 15
  • 38
  • 2
    what "performance requirements" are making this different from the linked question? You just say you need it efficiently, while the other question is asking for doing it efficiently.... – 463035818_is_not_an_ai May 09 '18 at 11:56
  • The performance requirement varies dynamically from call to call, so I cannot statically select one alternative. That part wasn't resolved in the linked question. – Hans Olsson May 09 '18 at 12:09
  • 1
    At this level of optimization (not simply using the standard library) we would really need to see sample data and benchmarks. Once you get to actual data, compilers and hardware then you can always optimize more. Without this information the question really isn't that much different from the linked one even though it expresses a willingness to switch approaches based on the case at hand (which the standard library may do already). – wally May 09 '18 at 12:09
  • @wally An implementation of the standard set_intersection could possible switch approaches, but does any implementations do that? If so, how? – Hans Olsson May 09 '18 at 12:12
  • The standard library has some freedom in how it is implemented internally, so it could be checked. Another approach that may be viable is to write your own parallel function to share the task among multiple cores. Or pehaps to offload to a GPU. In those cases you are certainly considering alternatives outside of the standard library's mandate. – wally May 09 '18 at 12:14
  • 1
    what about a `if ((n1+n2) < n1*log2(n2))` then pick the better one? (would of course also need to take into account `n2*log2(n1))`). Btw, thats not a different requirement, it is just asking how to do it efficiently in the general case. Sorry for nitpicking, just wanted to understand better what you really mean, though i still feel that this is close to being a dupe – 463035818_is_not_an_ai May 09 '18 at 12:17
  • [Set intersection on a GPU](https://stackoverflow.com/q/29759884/1460794) – wally May 09 '18 at 12:23
  • [You can](https://stackoverflow.com/a/30141130/1460794) make `std::set_intersection` as well as a bunch of other standard library algorithms run in parallel by defining `_GLIBCXX_PARALLEL` during compilation. – wally May 09 '18 at 12:24
  • @user463035818 Yes, that is the current approach (with three different variants), and it works - but I was hoping for something better. – Hans Olsson May 09 '18 at 12:29
  • better with respect to what? avoiding the condition and resulting jumps? Make it a template with sizes of the maps as parameters, however, thats not dynamic anymore – 463035818_is_not_an_ai May 09 '18 at 12:32
  • @wally - Running in parallel is interesting, but some of the code that is used cannot run in parallel yet; just having a simple lock would destroy any benefit of parallelization. Similarly the data is currently not on a GPU so copying the data to the GPU would destroy any performance improvement of the code. – Hans Olsson May 09 '18 at 12:42
  • @user463035818 Ideally combining the best performance of them - since that avoids having to tune the switch between approaches; but at least making it clear that it is the same code. – Hans Olsson May 09 '18 at 12:51
  • If you are using std::map or std::set one problem could be the bad cache locality. You could try to use a pool allocator or a sorted vector instead of tree depending on your requirements. There are solutions in boost for both approaches. – sv90 May 09 '18 at 15:33

4 Answers4

4

In almost every case std::set_intersection will be the best choice. The other solution may be better only if the sets contain a very small number of elements. Due to the nature of the log with base two. Which scales as:

n = 2, log(n)= 1
n = 4, log(n)= 2
n = 8, log(n)= 3
.....
n = 1024 log(n) = 10

O(n1*log(n2) is significantly more complex than O(n1 + n2) if the length of the sets is more than 5-10 elements.

There is a reason such function is added to the STL and it is implemented like that. It will also make the code more readable.

Selection sort is faster than merge or quick sort for collections with length less than 20 but is rarely used.

Petar Velev
  • 2,305
  • 12
  • 24
  • First and foremost there are cases with 5 elements and >1000 elements, so it happens. However, I don't agree with the conclusion. For selection sort vs. quick sort it depends on the complexity of the algorithms. I haven't tested relative performance of ++ vs find in a map - but they seem more similar. And if the constants are the same for (n1+n2) and n2*log2(n1), then n1=1000 gives n2=111, and gives n1=10 000 gives n2=813. – Hans Olsson May 09 '18 at 13:03
  • I mentioned selection sort because its complexity is O(n^2) but it performs better than quick or merge sort for a small number of elements. However if the sizes of the sets are equal then for sets with 10 elements you'll have 10 + 10 operations or 10*3.32(log(10) operations - over 30. The only possible way for `find` to be faster would be when you have collections with less than 4 elements where log(n) will be less than 2. – Petar Velev May 09 '18 at 13:16
  • @PetarVelev The sets can be of very different lengths. If we have one set with 1024 and another with 100 elements, it is 1024+100=1124 vs. 100*log2(1024)=1000 - and I found cases with >1000 elements intersecting with not only <100 elements, but even <10 elements. – Hans Olsson May 09 '18 at 13:21
  • 1
    @HansOlsson you are comparing apples to oranges. `std::set_intersection` is *specified as* "At most 2 * (N1+N2-1) comparisons" whereas `std::set::find` is *specified as* O(log(`size()`)). For one you have a *strict count* of operations, the other is an asymptotic bound – Caleth May 09 '18 at 13:38
  • @Caleth Theoretically the guarantees differ, but in practice set_intersection actually uses that many comparisons, and set::find has a small constant and no major overhead for small sets, i.e. it is <=c1+c2*log2(size()) for "small" c1 and c2 - which gives similar conclusions. I can also see that changing algorithm to not use set_intersection improves performance of that part of the program from about 90s to 30s in some cases - and that part involves lots of other code as well. – Hans Olsson May 09 '18 at 14:44
2

For sets that are implemented as binary trees, there actually is an algorithm that combines the benefits of both the procedures you mention. Essentially, you do a merge like std::set_intersection, but while iterating in one tree, you skip any branches that are all less than the current value in the other.

The resulting intersection takes O(min(n1 log n2, n2 log n1, n1 + n2), which is just what you want.

Unfortunately, I'm pretty sure std::set doesn't provide interfaces that could support this operation.

I've done it a few times in the past though, when working on joining inverted indexes and similar things. Usually I make iterators with a skipTo(x) operation that will advance to the next element >= x. To meet my promised complexity it has to be able to skip N elements in log(N) amortized time. Then an intersection looks like this:

void get_intersection(vector<T> *dest, const set<T> set1, const set<T> set2)
{
    auto end1 = set1.end();
    auto end2 = set2.end();
    auto it1 = set1.begin();
    if (it1 == end1)
        return;
    auto it2 = set2.begin();
    if (it2 == end2)
        return;
    for (;;)
    {
        it1.skipTo(*it2);
        if (it1 == end1)
            break;
        if (*it1 == *it2)
        {
            dest->push_back(*it1);
            ++it1;
        }
        it2.skipTo(*it1);
        if (it2 == end2)
            break;
        if (*it2 == *it1)
        {
            dest->push_back(*it2);
            ++it2;
        }
    }
}

It easily extends to an arbitrary number of sets using a vector of iterators, and pretty much any ordered collection can be extended to provide the iterators required -- sorted arrays, binary trees, b-trees, skip lists, etc.

Matt Timmermans
  • 53,709
  • 3
  • 46
  • 87
  • Ok, that seems like what I was looking for. Do you have any references for it? Is there an implementation in boost or similarly? – Hans Olsson May 09 '18 at 13:15
  • Unfortunately, I think you'd have to roll your own. The key is usually a set with an efficient skip-until operation. I'll update the answer to describe how it's implemented in terms of that in case you see one. – Matt Timmermans May 09 '18 at 23:45
0

With regard to the performance requirement, O(n1 + n2) is in most circumstances a very good complexity so only worth considering if you're doing this calc in a tight loop.

If you really do need it, the combination approach isn't too bad, perhaps something like?

Pseudocode:

x' = set_with_min_length([x, y])
y' = set_with_max_length([x, y])
if (x'.length * log(y'.length)) <= (x'.length + y'.length):
     return iterate_over_map_find_elements_in_other(y', x')

return std::set_intersection(x, y)

I don't think you'll find an algorithm that will beat either of these complexities but happy to be proven wrong.

Caleth
  • 52,200
  • 2
  • 44
  • 75
silleknarf
  • 1,219
  • 6
  • 22
0

I don't know how to do this using the standard library, but if you wrote your own balanced binary search tree, here is how to implement a limited "find with hint". (Depending on your other requirements, a BST reimplementation could also leave out the parent pointers, which could be a performance win over the STL.)

Assume that the hint value is less than the value to be found and that we know the stack of ancestors of the hint node to whose left sub-tree the hint node belongs. First search normally in the right sub-tree of the hint node, pushing nodes onto the stack as warranted (to prepare the hint for next time). If this doesn't work, then while the stack's top node has a value that is less than the query value, pop the stack. Search from the last node popped (if any), pushing as warranted.

I claim that, when using this mechanism to search successively for values in ascending order, (1) each tree edge is traversed at most once, and (2) each find traverses the edges of at most two descending paths. Given 2*n1 descending paths in a binary tree with n2 nodes, the cost of the edges is O(n1 log n2). It's also O(n2), because each edge is traversed once.

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120