12

I know the STL has set_difference, but I need to just know if 2 sets are disjoint. I've profiled my code and this is slowing my app down quite a bit. Is there an easy way to see if 2 sets are disjoint, or do I need to just roll my own code?

EDIT: I also tried set_intersection but it took the same time...

polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
rlbond
  • 65,341
  • 56
  • 178
  • 228

5 Answers5

17

Modified hjhill's code to reduce complexity by a factor of O(log n) by getting rid of the count() call.

template<class Set1, class Set2> 
bool is_disjoint(const Set1 &set1, const Set2 &set2)
{
    if(set1.empty() || set2.empty()) return true;

    typename Set1::const_iterator 
        it1 = set1.begin(), 
        it1End = set1.end();
    typename Set2::const_iterator 
        it2 = set2.begin(), 
        it2End = set2.end();

    if(*it1 > *set2.rbegin() || *it2 > *set1.rbegin()) return true;

    while(it1 != it1End && it2 != it2End)
    {
        if(*it1 == *it2) return false;
        if(*it1 < *it2) { it1++; }
        else { it2++; }
    }

    return true;
}

I've complied and tested this code now so it should be good.

Graphics Noob
  • 9,790
  • 11
  • 46
  • 44
  • Yes, much better. But you also should optimize for when the sets are obviously disjoint (the .begin of one is greater than the .end of the other) to yield O(1) performance in this case. – Terry Mahaffey Dec 26 '09 at 20:20
  • You forgot to initialize the iterators to setN.begin() – hjhill Dec 26 '09 at 20:34
  • 1
    *it1End and *it2End are undefined - the iterators returned by end() are past-the-end iterators and therefore not dereferenceable. – hjhill Dec 26 '09 at 20:53
  • Your "overlap" test contains two more errors: 1. What if one of the sets is empty (*rbegin() is undefined in this case), 2. It should return `true` for the non-overlapping case (because then the two sets **are** disjoint). – hjhill Dec 26 '09 at 21:28
  • Line 13, if(*it > *set2.rbegin() || *it2 > *set1.rbegin()) return true; should be if (*it1 > *set2...) ... ? – Yawar Dec 26 '09 at 22:49
  • Well, this answers my question -- I guess there's no "easy" way to do this. – rlbond Apr 15 '10 at 20:07
  • I think this have complexity of O(n). You have to walk through missing part. – ony Mar 18 '15 at 13:30
5

Since std::set is a sorted container, you can compare the set boundaries to see if they possibly intersect, and if so, do the expensive STL operation.

Edit:

Here's where clam fishing gets serious ...

All the code posted so far tries to re-implement what's already in <algorithm>. Here's the gist of set_intersection:


  template<typename _InputIterator1, typename _InputIterator2,
           typename _OutputIterator>
    _OutputIterator
    set_intersection(_InputIterator1 __first1, _InputIterator1 __last1,
                     _InputIterator2 __first2, _InputIterator2 __last2,
                     _OutputIterator __result)
    {
      while (__first1 != __last1 && __first2 != __last2)
        if (*__first1 < *__first2)
          ++__first1;
        else if (*__first2 < *__first1)
          ++__first2;
        else
          {
            *__result = *__first1;
            ++__first1;
            ++__first2;
            ++__result;
          }
      return __result;
    }

Pretty optimal already except for the assignment to the output iterator, which is unnecessary for finding just the fact whether two sets are joint. This can be used though to flip a flag as follows:


  /// fake insert container
  template <typename T> struct intersect_flag
  {       
    typedef int iterator;
    typedef typename T::const_reference const_reference;

    bool flag_; // tells whether given sets intersect

    intersect_flag() : flag_( false ) {}

    iterator insert( iterator, const_reference )
    {       
      flag_ = true; return 0;
    }
  };

  // ...
  typedef std::set<std::string> my_set;

  my_set s0, s1;
  intersect_flag<my_set> intf;

  // ...        
  std::set_intersection( s0.begin(), s0.end(),
    s1.begin(), s1.end(), std::inserter( intf, 0 ));

  if ( intf.flag_ ) // sets intersect
  {
    // ...

This avoids copying the objects from the original sets and lets you reuse STL algorithms.

Nikolai Fetissov
  • 82,306
  • 11
  • 110
  • 171
  • 2
    Except, the sets aren't continuous, or in one dimension. – rlbond Dec 26 '09 at 19:48
  • 4
    Nope - what about {1, 3, 5} and {2, 4, 6}? Clearly disjoint, but with intersecting boundaries... – hjhill Dec 26 '09 at 19:57
  • Oh, forgot the 'possibly' qualifier :) – Nikolai Fetissov Dec 26 '09 at 22:26
  • Hmm, @rlbond, what do you mean by not in one dimension? Set elements have to be in a relative order imposed by the less-then operation. – Nikolai Fetissov Dec 26 '09 at 22:38
  • The sets are of 2d points and are sorted lexicographically. So VERY often, you'll have something like set1 = {(1,1),(1,2),(2,1),(2,2)}, set2 = {(1,20),(1,21),(1,22)}, which of course don't "overlap" in 2d but when sorted lexicographically, which is a strict ordering, set2 is completely in set1's boundaries. – rlbond Dec 27 '09 at 00:55
3

You could use set_intersection and test if the resulting set is empty, but I don't know if that's much faster.

The optimal implementation would stop the testing and return false as soon as the first equal element is found. I don't know of any ready-made solution for that, though

template<class Set1, class Set2> 
bool is_disjoint(const Set1 &set1, const Set2 &set2)
{
    Set1::const_iterator it, itEnd = set1.end();
    for (it = set1.begin(); it != itEnd; ++it)
        if (set2.count(*it))
            return false;

    return true;
}

isn't too complex and should do the job nicely.

EDIT: If you want O(n) performance, use the slightly less compact

template<class Set1, class Set2> 
bool is_disjoint(const Set1 &set1, const Set2 &set2)
{
    Set1::const_iterator it1 = set1.begin(), it1End = set1.end();
    if (it1 == it1End)
        return true; // first set empty => sets are disjoint

    Set2::const_iterator it2 = set2.begin(), it2End = set2.end();
    if (it2 == it2End)
        return true; // second set empty => sets are disjoint

    // first optimization: check if sets overlap (with O(1) complexity)
    Set1::const_iterator it1Last = it1End;
    if (*--it1Last < *it2)
        return true; // all elements in set1 < all elements in set2
    Set2::const_iterator it2Last = it2End;
    if (*--it2Last < *it1)
        return true; // all elements in set2 < all elements in set1

    // second optimization: begin scanning at the intersection point of the sets    
    it1 = set1.lower_bound(*it2);
    if (it1 == it1End)
        return true;
    it2 = set2.lower_bound(*it1);
    if (it2 == it2End)
        return true;

    // scan the (remaining part of the) sets (with O(n) complexity) 
    for(;;)
    {
        if (*it1 < *it2)
        {
            if (++it1 == it1End)
                return true;
        } 
        else if (*it2 < *it1)
        {
            if (++it2 == it2End)
                return true;
        }
        else
            return false;
    }
}

(modified Graphics Noob's modification further, using only operator <)

hjhill
  • 2,399
  • 1
  • 16
  • 17
  • Actually, your implementation of is_disjoint is too complex (in the O(n) sense) - each iteration over set1 does an O(log(n)) lookup for the element in set2, giving a total runtime of O(nlog(n)), or O(nlog(n))/2 on average. This is possible in O(n) time. – Terry Mahaffey Dec 26 '09 at 20:17
  • How understandable something is is subjective. How it performs is not. – Terry Mahaffey Dec 26 '09 at 20:26
  • @Terry Mahaffey - yes, provided the iteration of both sets is possible in O(1) per step (I have to think about that, sets are usually implemented as some kind of tree structure). The code would be less easy to understand though... – hjhill Dec 26 '09 at 20:27
  • @Terry Mahaffey: It depends. Scanning both sets is O(n+m). Scanning only the first list, and probing the second, is O(n log m), which will be faster if n is small and m is large. – Jason Orendorff Dec 28 '09 at 19:53
2

It is possible to get O(log(n)) by using the fact that both sets are sorted. Just use std::lower_bound instead of moving iterator by one.

enum Ordering { EQ = 0, LT = -1, GT = 1 };

template <typename A, typename B>
Ordering compare(const A &a, const B &b)
{
    return
        a == b ? EQ :
        a < b ? LT : GT;
}

template <typename SetA, typename SetB>
bool is_disjoint(const SetA &a, const SetB &b)
{
    auto it_a = a.begin();
    auto it_b = b.begin();
    while (it_a != a.end() && it_b != b.end())
    {
        switch (compare(*it_a, *it_b))
        {
        case EQ:
            return false;
        case LT:
            it_a = std::lower_bound(++it_a, a.end(), *it_b);
            break;
        case GT:
            it_b = std::lower_bound(++it_b, b.end(), *it_a);
            break;
        }
    }
    return true;
}
ony
  • 12,457
  • 1
  • 33
  • 41
  • Great solution. I do not understand why noone is upvoting it. Although its not O(log(n)) in the worst-case - consider a "staggered" case like {1,3,5,7} and {2,4,6,8} for example. But, on average it should be log-ish indeed. – Anton Belov Jan 10 '16 at 19:00
  • Actually idea of O-notation is to show complexity scaling with data scaling. Worst cases usually do not scale well ;) . For most of other solutions mentioned here - worst case would be either the one where they have intersection in a "last" value (with highest order) or when they are disjoint in fact. And that's regardless of values distribution. For accepted solution `is_disjoint()` always have O(n) for disjoint sets except of special optimization to detect non-overlapped boundaries. SO often so unfair... :) – ony Jan 11 '16 at 20:33
1

Use std::set_intersection, and see if the output is empty. You could check to see if the range of the two sets (area covered by the begin and end iterators) overlap first, but I suspect that set_intersection might already do this.

These are all O(n) operations, as is is_disjoint. If O(n) is unacceptable, you'll have to build some side storage to "keep track" of the disjoint-ness of the sets as you add/remove elements. Here, you would pay the overhead at insertion time (by updating your "isdisjoint" structure for each set modification), but is_disjoint could be implemented cheaply. This may or may not be a good tradeoff to make.

Terry Mahaffey
  • 11,775
  • 1
  • 35
  • 44
  • @Xavier Nodet - the intersection of two sets contains only the elements contained in both sets, so it can't be larger than the smallest of the two sets. – hjhill Dec 26 '09 at 20:15
  • No, I do not mean that. That would be the correct test if you're using std::set_union – Terry Mahaffey Dec 26 '09 at 20:21
  • I think `set_intersect` can't perform the 'overlap' test because the inputs to `set_intersect` are input iterators and not bidirectional iterators. – hjhill Dec 26 '09 at 21:17