6

I'm brand new to C++ and have been asked to convert a Java program to C++. I'm trying to write a method to check that all elements in an unordered_set exist in another unordered_set. I found the example below using hash_set but hash_set is deprecated and it is recommended to use unordered_set now.

// returns true if one contains all elements in two
bool SpecSet::containsAll(hash_set<Species*> one, hash_set<Species*> two) {
   sort(one.begin(), one.end());
   sort(two.begin(), two.end());
   return includes(one.begin(), one.end(), two.begin(), two.end());
}

So I need a way to do this using unordered_set. Sort does not work on unordered sets and lookup speed is important, so I don't want to use an ordered set.

bool SpecSet::containsAll(unordered_set<Species*> one, unordered_set<Species*> two) {

   return ?;
}

I'd really appreciate some help with an approach to doing this efficiently.

EDIT: I guess this will work. It seems there is no more efficient way but to loop over all in two.

bool SpecSet::containsAll(unordered_set<Species*> one, unordered_set<Species*> two) {
   if(two.size() > one.size())
   {
      return false;
   }

   for(Species *species : two)
   {
      if(one.find(species) == one.end())
      {
         return false;
      }
   }
   return true;
}
Steve W
  • 1,108
  • 3
  • 13
  • 35
  • [related](https://stackoverflow.com/questions/47901339/unordered-set-intersection-in-c) – Caleth Jan 17 '18 at 11:03
  • 3
    Check each element one by one? (possibly after comparing the sizes to rule out obvious 'no' cases) – Marc Glisse Jan 17 '18 at 11:13
  • Thanks Caleth/Mark. Your tips led me to write the above method. I think this is the best I can do? – Steve W Jan 17 '18 at 11:19
  • I am not 100% sure but it seems the best way. The complexity is linear since find is const time. – ArmanHunanyan Jan 17 '18 at 11:29
  • You probably don't want to be passing your sets by value like that, if they are too large for you to transform to ordered sets. – Toby Speight Jan 17 '18 at 11:52
  • 1
    There are no magical buttons one can push and get an instant answer whether one unordered_set contains all values from another one. The manual, one by one, search is the only way to do it; barring special use cases on the order of sets of small values, in which case some tomfoolery could be done with bitmasks, etc... Besides this seems to be faster than sort-based, which appears to be `O(n log n)`, while the manual lookup seems to be `O(n)`. Just because something is done via smaller amounts of code doesn't necessarily mean it's "faster". – Sam Varshavchik Jan 17 '18 at 11:54
  • Different inputs may cause different time cost on such task. It's ok to iterate all elements and recommended to use the trivial algorithm you described yourself. – AntiMoron Jan 17 '18 at 12:15
  • This is not related to your question, but I would like to mention it :First, avoid using raw pointers (mytype*) and use smart pointers instead https://en.cppreference.com/book/intro/smart_pointers. They cost more (well, nearly nothing more) to manipulate than raw pointers but will prevent a lot of memory leaks. Second, you can use `auto` keyword for deducing type of an assign expression, which is especially interesting when doing a loop : `for (auto& specie : species ){...})` and ease the read of your code. – Felix Bertoni Nov 26 '19 at 16:28
  • You can shorten this code considerably using something like "return ((one.size() >= two.size()) && std::all_of(two.begin(), two.end(), [&one](const Species* s){ return one.count(s) != 0; }));" – Some Guy Oct 05 '22 at 23:05

2 Answers2

1

Disclaimer: This is not the most efficient approach. It is an attempt at solution that would be as generic and flexible as std::includes while supporting unordered iterator ranges. It is not limited to std::unordered_set and should work for any other container, e.g., std::vector or std::list.


As it was pointed out std::includes requires the input ranges to be sorted. At this moment unordered ranges are not supported in the standard library.

Looking at possible implementations of std::includes a version for unordered ranges can be implemented. For example like so:

template<class InputIt1, class InputIt2>
bool includes_unordered(
    InputIt1 first1, InputIt1 last1,
    InputIt2 first2, InputIt2 last2)
{
    for (; first2 != last2; ++first2)
    {
        InputIt1 it1;
        for (it1 = first1; it1 != last1; ++it1)
        {
            if(*first2 == *it1)
                break;
        }
        if (it1 == last1)
            return false;
    }
    return true;
}

Note: containers' size-comparison optimization is not performed to support containers of non-unique objects. But if needed it can be done using std::distance.

And here's a version taking an equivalence operator:

template<class InputIt1, class InputIt2, class Equivalence>
bool includes_unordered(
    InputIt1 first1, InputIt1 last1,
    InputIt2 first2, InputIt2 last2,
    Equivalence equiv)
{
    for (; first2 != last2; ++first2)
    {
        InputIt1 it1;
        for (it1 = first1; it1 != last1; ++it1)
        {
            if(equiv(*first2, *it1))
                break;
        }
        if (it1 == last1)
            return false;
    }
    return true;
}

Small live-example

Then includes_unordered can be used in the same way as std::includes would.

AMA
  • 4,114
  • 18
  • 32
  • 1
    Both ranges need to be sorted to be used with std::includes – ArmanHunanyan Jan 17 '18 at 11:31
  • 1
    This will not work, 'cause ranges must be sorted. `unordered_set` is not sorted. – user2807083 Jan 17 '18 at 11:32
  • Of course you are right guys, I was fast-drawing. I've updated the answer. – AMA Jan 17 '18 at 13:15
  • 1
    That seems to be an O(_m_×_n_) algorithm, where _m_ and _n_ are the sizes of the two sets. Whilst it will work, it's far from efficient. – Toby Speight Jan 17 '18 at 13:50
  • @TobySpeight correct, your answer is more efficient. It's just my take on something that would be as generic and flexible as `std::includes`. I've added a short disclaimer. – AMA Jan 17 '18 at 14:02
1

With unsorted collections, there's no faster algorithm than to iterate over the smaller collection while testing that each element is a member of the larger. This will naturally scale as O(n), where n is the size of the putative subset, since we perform an O(1) find operation n times.


Here's some demonstration code, with tests:

#include <unordered_set>

template <typename T>
bool is_subset_of(const std::unordered_set<T>& a, const std::unordered_set<T>& b)
{
    // return true if all members of a are also in b
    if (a.size() > b.size())
        return false;

    auto const not_found = b.end();
    for (auto const& element: a)
        if (b.find(element) == not_found)
            return false;

    return true;
}
int main()
{
    const std::unordered_set<int> empty{ };
    const std::unordered_set<int> small{ 1, 2, 3 };
    const std::unordered_set<int> large{ 0, 1, 2, 3, 4 };
    const std::unordered_set<int> other{ 0, 1, 2, 3, 9 };

    return 0
        +  is_subset_of(small, empty) // small ⊄ ∅
        + !is_subset_of(empty, small) // ∅ ⊂ small
        +  is_subset_of(large, small) // large ⊄ small
        + !is_subset_of(small, large) // small ⊂ large
        +  is_subset_of(large, other) // large ⊄ other
        +  is_subset_of(other, large) // other ⊄ large
        + !is_subset_of(empty, empty) // ∅ ⊂ ∅
        + !is_subset_of(large, large) // x ⊂ x, ∀x
        ;
}

An equivalent, using standard algorithm instead of writing an explicit loop:

#include <algorithm>
#include <unordered_set>

template <typename T>
bool is_subset_of(const std::unordered_set<T>& a, const std::unordered_set<T>& b)
{
    // return true if all members of a are also in b
    auto const is_in_b = [&b](auto const& x){ return b.find(x) != b.end(); };

    return a.size() <= b.size() && std::all_of(a.begin(), a.end(), is_in_b);
}

(obviously using the same main() for tests)


Note that we pass the sets by reference, not by value, as you've indicated that the sets are too large for you to copy and sort them.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103