1

I am getting the sense that I am using an unordered_map incorrectly, in the sense that I am not getting the average O(1) complexity, but rather O(n).

Right now I define the map as:

unordered_map<my_class , double> objects_to_values;

The key thing is how I define my_class.

my_class at the moment is identified by three integers, call them a, b and c.

A skeleton looks like:

class my_class  {
private:
        int _a, _b, _c;

public:
        my_class(int _a, int _b, int _c) {
                this->_a = _a;
                this->_b = _b;
                this->_c = _c;
        }

        int a() const {
                return _a;
        }

        int b() const {
                return _b;
        }

        int c() const {
                return _c;
        }

        bool operator==(const my_class& obj) const {
                return ((a() == obj.a()) && (b() == obj.b()) && (c() == obj.c()));
        }
};

In addition, I have

struct hash<my_class> {
public:
        size_t operator()(const my_class& k) const {
                return ( (hash<int>()(k.c())) ^ (( ((hash<int>()(k.a()))^ (hash<int>()(k.b())
                                << 1) >> 1) << 1) >> 1) );
        }
};

What I observe is that my code becomes much slower as the size of the unordered_map increases, probably because of my calls to find, which normally I would expect to be O(1) on average.

Am I doing it all right, or should I improve the hash function (which admitted is not great), or am I doing something wrong? Is there a better way to hash together three values, in the range of 0-100000 or so?

kloop
  • 4,537
  • 13
  • 42
  • 66
  • 1
    there is no call to `find` in the code you posted. Only you know what methods you call in code you do not show. Please read about [mcve] – 463035818_is_not_an_ai Mar 01 '23 at 16:04
  • 2
    `unordered_map` basically has to be implemented as a `std::vector>` so it is a very non-cache friendly container. – NathanOliver Mar 01 '23 at 16:04
  • 1
    I suspect your `n` affects more than just the number of elements in the map. Just as silly exmple `insert` is constant, but calling it n-times is of course `O(n)`. You'd need some rather specific time measurement code to see only the increase in calls to find but not the increase due to other parts of the code dealing with larger `n`. Please show it – 463035818_is_not_an_ai Mar 01 '23 at 16:10
  • what's with the <<1 >> 1? – user253751 Mar 01 '23 at 16:10
  • @463035818_is_not_a_number I call find with an object of my_class, but I will add a full call. The reason I suspect this is probably the issue is after some profiling with callgrind... Brian yes, it is an int, I fixed this. NathanOliver Oh, I see, I didn't realize it is a vector, I will take a look at the spec of unordered_map. – kloop Mar 01 '23 at 16:13
  • [hash may be a pointless ritual](https://stackoverflow.com/questions/38304877/why-stdhashint-seems-to-be-identity-function). But you don't have to guess, you can examine things like `load_factor` or individual bucket sizes. Or convert the code to `std::map` with an appropriate `< operator`, if that is significantly faster, your hash function may be at fault. – teapot418 Mar 01 '23 at 16:13
  • Your hashing algorithm has **an extreme** amount of collisions. – Ted Lyngmo Mar 01 '23 at 16:21
  • 1
    @user253751 >> and << means shift bits left and right in an int. Ted: I needed someone to confirm that for me, I had a feeling that could be the issue... So I invested some time in writing up a cantor hash function based on another stackoverflow question, and the speed-up is magnificent. thank you! – kloop Mar 01 '23 at 19:23
  • @kloop You're welcome! I'm not an expert in creating hash functions myself. I've often used `boost::hash_combine` but that one isn't so good when it comes to collisions either. I found one here at SO that is a bit slower, but it doesn't create many collisions. It's a trade-off. [example](https://godbolt.org/z/GfqEe4ev6) – Ted Lyngmo Mar 01 '23 at 21:14
  • @kloop but what is the point of doing <<1>>1 all over the hash? – user253751 Mar 02 '23 at 11:19

1 Answers1

0

I'm sorry to inform you that your hash function is BAD™, leads to countless collisions and pluming performances.

Sampling with a,b,c between -5 and 5:

#include <iostream>
#include <iomanip>
#include <functional>
#include <set>

int main() {
    std::set<std::size_t> values;
    for (int a = -5 ; a <= 5 ; ++a) {
        for (int b = -5 ; b <= 5 ; ++b) {
            for (int c = -5 ; c <= 5 ; ++c) {
                auto const ha = std::hash<int>()(a);
                auto const hb = std::hash<int>()(b);
                auto const hc = std::hash<int>()(c);
                auto const hash = ( hc ^ ( ( (ha ^ (hb << 1) >> 1) << 1) >> 1) );
                values.insert(hash);
                std::cout << "hash(" << a << ", " << b << ", " << c << ") = " << std::hex << hash << std::dec << '\n';
            }
        }
    }
    std::cout << "Colision rate: " << 1.0 - (values.size() / (11.0*11*11)) << "\n";
}

Gives a colision rate of ~98%. Demo.

YSC
  • 38,212
  • 9
  • 96
  • 149