How to detect duplicates in a vector of unordered_map?

Question

Given a vector of unordered_map<u_int,int>, I would like to check if the vector contains any duplicated values. Two unordered_maps are considered duplicated if all of their keys and their corresponding values are equal. I know the comparison operator exists for unordered_maps, but I would like to avoid the pairwise comparison of each element with each other. One classical solution is to insert the values of the vector into a set, then to compare the number of elements in the set and the vector. However, the problem here is that the object to be inserted into the set must have the comparison operators overloaded. In case of the unordered_set, the hash function to be used must be overloaded for the complex object. In order to overload, I need to derive a class from the std::unordered_map. Then I need to overload either the comparison operator or the hash function. Another solution that I could think of is to concatenate all of the key value pairs into a string, then sort the string by the keys and detect the duplicates on those strings. I wonder what would be the best solution for this problem.
Example data:

using namespace std;
typedef unordered_map<u_int,int> int_map;
int_map a = { {1,1}, {2,4}, {3,5} };
int_map b = { {1,1}, {2,-1}, {4,-2} };
int_map c = { {1,1}, {3,5} };

vector<unordered_map<u_int,int>> my_vec;

my_vec.push_back(a);
my_vec.push_back(b);
my_vec.push_back(c);

The contents of my_vec is:

 { { 1 => 1, 2 => 4, 3 => 5 }, 
 { 1 => 1, 2 => -1, 4 => -2 }, 
 { 1 => 1, 3 => 5 } }

Please feel free to ask/commend/edit if the question is not clear enough. Any help would be appreciated. Thank you in advance!

The answer here is `false` since all of the keys and their corresponding values are not identical in any of the two elements. — anilbey, Aug 22 '18 at 10:17
create fast simple hash for each set. And if hashes are equal -- do pairwise comparation. You can't avoid checking each value of each set (because it may violate the answer), so you will do it: by hash creation. But you will avoid pairwise comparation for most of the cases, doing it just when hash is equal.\ — Arkady, Aug 22 '18 at 10:23
Note that you can specify a custom hasher and/or equality comparator as template arguments to both `std::set` and `std::unordered_map`, so inheritance is unnecessary. — Angew is no longer proud of SO, Aug 22 '18 at 10:24
(publicly) inheriting from `std` containers is almost never a good idea. Its lots of text and I find it hard to follow, but I think all you need is a custom comparator. "the problem here is that the object to be inserted into the set must have the comparison operators overloaded" - thats not true, you can use any comparator you like, its the second template parameter of `std::set` — 463035818_is_not_an_ai, Aug 22 '18 at 10:36
How can I specify a custom hasher and/or equality comparator as template arguments? — anilbey, Aug 22 '18 at 11:23
@anilbey hash functor is the second template argument of unordered set. Comparison functor is the second template argument of the ordered set. — eerorika, Aug 22 '18 at 11:34

score 1 · Answer 1 · answered Aug 22 '18 at 14:32

1

you can something similar to the following :

typedef unordered_map<u_int,int> int_map;

struct my_map_comparator
{
    bool operator()(const int_map& a, const int_map& b) 
    { 
      a_hash = compute_hash_for_a(all keys of a)
      b_hash = compute_hash_for_b(all keys of b)

      return a_hash == b_hash; 
    }
};

std::unordered_set<int_map,std::hash<int_map>, my_map_comparator> map_list();

answered Aug 22 '18 at 14:32

AdityaG

428
1
3
17

Could you give me some ideas about how to compute the hash for multiple keys? – anilbey Aug 22 '18 at 14:57
if u are using boost .. boost hash combine exist .. otherwise look here https://stackoverflow.com/questions/17016175/c-unordered-map-using-a-custom-class-type-as-the-key – AdityaG Aug 22 '18 at 15:04
One detail: since the order is not preserved in unordered_maps, the maps a and d below will result in different hash values unless I sort them. Maybe I should use std::map instead. `unordered_map a = { {1,1}, {2,4}, {3,5} }; unordered_map d = { {2,4}, {1,1}, {3,5} };` – anilbey Aug 23 '18 at 09:31
Actually, if I hash an unordered map twice. Then I am sure I will get different hash values because the ordering of the keys will be different each time I iterate over the keys. – anilbey Aug 23 '18 at 09:33
good point ... yes ... you will get a different hash .. then the way as you said is use a set .. ! – AdityaG Aug 23 '18 at 12:40
1

@anilbey Just to be extremely pedantic: It's not at all guaranteed that "the ordering of the keys will be different each time". The `unordered` does not mean '_elements are guaranteed **never** to be in the same order_'... It means '_elements are **not** guaranteed to be in any specific/repeatable order_'. – underscore_d Jan 02 '19 at 23:10
@underscore_d you're right. The order does not necessarily have to be different each time. – anilbey Jan 03 '19 at 10:40

score 1 · Answer 2 · edited Jul 04 '22 at 15:02

1

If you can get a good hash function for std::unordered_map then you should do it like this probably:

bool has_distinct_values(const std::vector<std::unordered_map<u_int, int>> v)
{
  std::unordered_map<int, std::list<int>> hash_to_indexes_map; 
  for(auto i = 0u; i < v.size(); ++i)
  {
    auto empl_result = hash_to_index_map.emplace(custom_hash(v[i]), {i});
    if (!empl_result.second)
    {  
       for (auto index : empl_result.first->second)
       {
         if (v[index] == v[i]) return false;
       }
       empl_result.first->second.push_back(i);
    }
  }
  return true;
}

The algorithm is straightforward: map hashes to list indexes, doing pairwise map comparison whenever hashes are equal. This way you avoid copying the entire maps, get O(N) (depending mostly on the quality of the hash function you provide) time complexity and generally are good to go.

edited Jul 04 '22 at 15:02

Brandlingo

2,817
1
22
34

answered Aug 22 '18 at 19:53

paler123

976
6
18

But even if I have a good hash function, each time I hash the `unordered_map`, I am sure I will get a different hash value because the keys are not ordered. e.g. each time I iterate over the keys, I see a different order. – anilbey Aug 23 '18 at 09:36
Well, since the map is unordered then ordering should not really affect the hashes you get. In other words if two maps are considered equal (by explicit check with == op) then they must have equal hashes. A valid hash function needs to produce equal hashes for equal elements, a good hash function should produce different hashes for different elements. – paler123 Aug 23 '18 at 09:44
I cannot think of an efficient way of implementing a hash function achieving this without having to sort the keys. If the sorting is inevitable, then I prefer storing the data in a `map` in the first place (instead of `unordered_map`). What do you think? – anilbey Aug 23 '18 at 15:39
You would then need to order the maps (e.g. implement comparison operator of some sort). Equality is quite straightforward to define for unordered_maps, I don't see however a universally good approach to ordering them. As for the hash function... You could use any hash on key / value pairs from the map and then accumulate them in a way that does not depend on the ordereing (e.g. product or sum have that property). – paler123 Aug 23 '18 at 18:35

How to detect duplicates in a vector of unordered_map?

2 Answers2

Linked