-1

I created an unordered_set on my own class A

struct A {
    int v;
};

struct A_eq
{
    bool operator()(A* lhs, A* rhs) const {
        return lhs->v == rhs->v;
    }
};

struct A_hash
{
    std::size_t operator()(A* obj) const {
        return std::hash<int>{}(obj->v);
    }
};

int main(int , char* []) {
    A* a7 = new A(); A* a5 = new A(); A* a7_ = new A(); A* a5_ = new A();
    a7->v = 7; a5->v = 5; a7_->v = 7; a5_->v = 5;

    std::unordered_set<A*, A_hash, A_eq> s;
    s.insert(a7); s.insert(a5); s.insert(a7_);

    printf("dump s:");
    for (auto& obj : s) {
        printf("%d,", obj->v);
    }
}

the program prints

dump s:5,7,

as expected.

Although a7 and a7_ are different pointers, they cannot both insert into set s, due to the fact that A_eq and A_hash are working on dereferenced pointer instead of pointer itself.

Then I picked the pointer stored in s through find, changed its dereferenced value using found->v = 7, so that the set contained two "same" element. But the set insists it's containing two different element.

    auto iter = s.find(a5_);
    A* found = *iter;
    printf("%d\n", found->v);
    found->v = 7;

    printf("dump s:");
    for (auto& obj : s) {
        printf("%d,", obj->v);
    }
    printf("\n");

    s.erase(a7_);
    s.erase(a7);
    s.rehash(0);

    printf("dump s:");
    for (auto& obj : s) {
        printf("%d,", obj->v);
    }
    printf("\n");

    iter = s.find(a7_);
    printf(iter==s.end()?"no 7":"has 7");

these codes outputs

5
dump s:7,7,
dump s:7,
no 7

Can someone tell me what's happening? Why the remaining element is not considered as A({.v=7}) by the set?

Catau
  • 55
  • 8
  • 6
    Congratulations, you intentionally broke the `unordered_set` invariant and things broke. It buckets stuff by the hash on insertion, so it bucketed based on the original value, and lookups (using the new value) are looking in the wrong place. – ShadowRanger Dec 24 '20 at 14:57
  • `unordered_set` uses the hash code to identify the place in its internal data structure where it places the element or looks for the element. If you change the element in a way that changes the place where the set expects the element to be stored, you get into trouble. This is why you should not do modifications to an element that change its hash code/equality with other elements. – fabian Dec 24 '20 at 15:13

3 Answers3

3

[unord.req]/5 ... For any two keys k1 and k2 in the same container, calling pred(k1, k2) shall always return the same value. For any key k in a container, calling hash(k) shall always return the same value.

Your program exhibits undefined behavior by way of violating this requirement. It modifies a key in a way that changes its hash, and that makes two keys compare equal where they compared unequal before.

Igor Tandetnik
  • 50,461
  • 4
  • 56
  • 85
1

Documentation of unordered_set from cppreference.com:

Container elements may not be modified (even by non const iterators) since modification could change an element's hash and corrupt the container.

To be more precise, it's not just the data directly in the container that is not allowed to be modified, but any data that affects the equality comparison of those elements. While you did not modify the bits inside the container, you did logically modify the elements of the container by changing them from being not equal to being equal. Thus, you changed an element's hash and corrupted the container. Garbage ensued.

JaMiT
  • 14,422
  • 4
  • 15
  • 31
  • Nice detailled answer! This is an often forgotten detail. const iterators are a first warning sign and a safety barrier here, but the language cannot and should not take care about all possible hacky pathways. – Secundi Dec 24 '20 at 20:00
0

Ok, several users here already critized your hacky scheme, totally correctly.

But to leave you with a further hint here for the future:

If you want to use custom classes with standard containers, try to use your classes as the direct types for the container (no pointers/wrappers), if you want to provide custom comparison/hash functions that refer to your actual class content or ensure at least, the underlying stored data is const!

Maybe there are some theoretical few exceptional cases here, but in general, custom comparators/hash methods should always refer to their containing 'decayed' class type, at least for their input argument passing. Otherwise, you break the single responsibility principle at least (the std-container cares about the possible raw pointer element usage already (or an own pointer type for your class should), your class should not interfer here again).

Secundi
  • 1,150
  • 1
  • 4
  • 12
  • 1
    *"Maybe there are some theoretical few exceptional cases here"* -- indeed there are, such as avoiding [slicing](https://stackoverflow.com/questions/274626/what-is-object-slicing). – JaMiT Dec 24 '20 at 23:17