2

I have a structure called Col:

struct Col
{
   Object* o1;
   Object* o2;
   Object* o3;

   bool operator==(const Col &other) const
   { 
       return o1 == o1 && o2 == o2 && o3 == o3;
   }
};

I define a hash on this:

template <>
struct std::hash<Col>
{
  std::size_t operator()(const Col& k) const
  {
    using std::size_t;

    std::hash<void> void_hash;

    std::size_t res = void_hash(k.o1) + void_hash(k.o2) + void_hash(k.o3;

    return res;
  }
};

I then have a long std::vector of Col objects which contains duplicates (per my definition of equality), and I wish to "process" each unique element only once.

This is what I do:

std::unordered_map<Col, bool> done_list;

for (Col c : col_list)
{
    if (done_list.find(c) == done_list.end())
    {
        process(c);

        done_list[c] = true;
    }
    else
    {
        continue;
    }
}

Is this a reasonable way to check for whether the same collection of objects has been processed already? Would a sparse matrix be better?

wohlstad
  • 12,661
  • 10
  • 26
  • 39
Omroth
  • 883
  • 7
  • 24
  • 9
    You don't even need a map if the value is just going to be a bool. Just use a (unordered_set where the presence alone indicates if you've processed. – Cory Kramer Jul 17 '23 at 15:43
  • This makes no sense if you are going to process every object once. I assume you have some kind of condition so that some objects are not processed at all? Otherwise, why use anything on top of the loop? – Nelfeal Jul 17 '23 at 15:49
  • @Nelfeal OP probably have duplicates (according to its equality function) in its vector. OP Can you say in the description your vector have duplicates? – Dorian Jul 17 '23 at 15:51
  • done @Nelfeal, thanks – Omroth Jul 17 '23 at 15:53
  • Does the hash function look reasonable btw - I haven't really used std::hash before. – Omroth Jul 17 '23 at 15:53
  • 1
    It depends on how you allocate your objects. If you create/delete/create often, chances are that your objects end up at the same address, thus creating potential conflicts. You need to monitor for your particular use. – Dorian Jul 17 '23 at 15:59
  • 1
    You're hashing *pointers*, but I would guess you actually want to hash the *data*. – Mooing Duck Jul 17 '23 at 16:00
  • 1
    If a Col object has the same pointers as another but o1 and o2 are swapped, are they the "same" object? (Your hash says yes). Also, why `using std::size_t` and then qualify size_t with its namespace anyway? :) – Chris Uzdavinis Jul 18 '23 at 15:21
  • @ChrisUzdavinis if they are not the same object, what would you advise for the hash function? And it's inexperience on the std::size_t thing, I don't actually know what using std::size_t does. – Omroth Jul 19 '23 at 06:50
  • @Omroth Then what you should be doing is to first make a `std::hash` specialization and in `std::hash` you would use `std::hash` on the dereferenced pointers in such a way that the ordering of the objects matters. Many people use [`boost::hash_combine`](https://stackoverflow.com/q/35985960/7582247) for this. – Ted Lyngmo Aug 01 '23 at 08:17

2 Answers2

6

You approach is OK, but you can simplify it a bit by using std::unordered_set instead of std::unordered_map.

The set will contain the values you already processed.

#include <unordered_set>

std::unordered_set<Col> done_list;

for (Col const & c : col_list)
{
    if (done_list.find(c) == done_list.end())
    {
        process(c);
        done_list.insert(c);
    }
    else
    {
        continue;
    }
}

Note that I used const & in the ranged for-loop to avoid copy and mutation of c. You can drop the const if need to mutate it.

If you are using c++20 you can use std::unordered_set::contains to make the check even simpler:

    if (done_list.contains(c) == false)
    // ...

Another variation takes advantage of the overload of std::unordered_set::insert that returns a pair where the bool second indicates whether the item was inserted (it will not be inserted if it already exists in the set):

    auto[it, inserted] = done_list.insert(c);
    if (inserted)
    {
        process(c);
    }
    else
    {
        continue;
    }
wohlstad
  • 12,661
  • 10
  • 26
  • 39
  • If you are looking for performance, a `vector` might be faster than an `unordered_set` when there are only a few hundreds element because of caching. – Dorian Jul 17 '23 at 15:57
  • 2
    Note that C++20 added [`std::unordered_set::contains`](https://en.cppreference.com/w/cpp/container/unordered_set/contains) so you don't need `find` + comparison to `end` – Cory Kramer Jul 17 '23 at 15:57
  • @CoryKramer good point - updated my answer. – wohlstad Jul 17 '23 at 16:01
  • 1
    @Dorian that's true, but since we don't know the number of elements I prefered the to use a set as a more general solution. I would use a `std::vector` as an optimizd version after profiling. – wohlstad Jul 17 '23 at 16:06
  • @wohlstad Oups, I didn't think you noticed so I added an answer too :-) Oh well, it still has a simplification in it so I'll leave it. – Ted Lyngmo Jul 17 '23 at 16:36
  • 1
    @TedLyngmo that's OK, I liked your idea of storing pointers in the set :-) – wohlstad Jul 17 '23 at 16:37
  • 2
    Hmm, wait a second. I just realized that my pointer idea will not do what OP wants with those pointer member variables. :-/ I'll delete my answer to not lead him/her astray. – Ted Lyngmo Jul 17 '23 at 16:43
  • @TedLyngmo you might be right, it's hard to know for sure without additional context. – wohlstad Jul 17 '23 at 16:47
  • 1
    @wohlstad True. I now returned to make it work with the pointers. Hopefully I got it right this time :-) – Ted Lyngmo Jul 17 '23 at 22:30
3
  • What you need is rather some kind of set which stores only one of each element.
  • You also do not need to store the Cols in the set. It's enough with pointers to the elements.

I suggest using an unordered_set so first we need a hasher and a comparator for the pointers:

struct colptr_hash {
    const bool operator()(const Col* col) const {
        return std::hash<Col>{}(*col);
    }
};

struct colptr_equal {
    const bool operator()(const Col* lhs, const Col* rhs) const {
        return *lhs == *rhs;
    }
};

Then the actual usage:

std::unordered_set<Col*, colptr_hash, colptr_equal> done_list;

for (Col& c : col_list) {           // take by reference
    // try to insert this Col*
    auto [it, inserted] = done_list.emplace(&c);

    if (!inserted) continue;        // already processed, try next

    // never processed, go ahead:
    process(c);
}
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108