14

I have a set of tuples of 3 integers and I don't want any duplicates. That is, I don't want 2 entries with the same 3 values.

And here is my code.

struct Key{
    unsigned a;
    unsigned b;
    unsigned c;
  public:
    Key(unsigned _a, unsigned _b, unsigned _c) :
        a(_a),
        b(_b),
        c(_c) {}
    bool operator<(const Key& rhs) const
    {
        if (a < rhs.a) {
            return true;
        }
        if (b < rhs.b) {
            return true;
        }
        if (c < rhs.c) {
            return true;
        }
        return false;
    };
};
std::set<Key> myset;

But I see duplicates in myset sometimes. I can't catch exactly what sequence causes the duplicate entry to be added. It doesn't always happen. My question is this, is there something intrinsically wrong with my operator< function?

BartoszKP
  • 34,786
  • 15
  • 102
  • 130
Sindhura Bandi
  • 347
  • 3
  • 11
  • Could you post some values that are duplicate? – StenSoft Mar 18 '15 at 10:31
  • 1
    Well done for asking the right initial question for this problem — often a broken strict weak ordering _does_ cause this. You may also have UB somewhere else if you're using the set in strange ways. – Lightness Races in Orbit Mar 18 '15 at 10:39
  • I was trying to simplify my question. The set I tested with actually has the first element of the tuple as type 'boost::asio::ip::address_v4'. Given that I saw (172.20.20.10, 4077, 17) entry twice in the set. – Sindhura Bandi Mar 18 '15 at 10:41

3 Answers3

35

It's nearly right! But you are cascading too soon.

bool operator<(const Key& rhs) const
{
    if (a < rhs.a)
        return true;
    if (a > rhs.a)
        return false;

    if (b < rhs.b)
        return true;
    if (b > rhs.b)
        return false;

    return (c < rhs.c);
};

Otherwise the following, for example, gives the wrong result:

Key lhs{3,1,0};
Key rhs{2,2,0};

assert(lhs < rhs); // passes, wrongly, because !(3 < 2) but then (1 < 2).
                   // you need an immediate `return false` when !(3 < 2)

It is safer to do something like this:

bool operator<(const Key& rhs) const
{
    return std::tie(a, b, c) < std::tie(rhs.a, rhs.b, rhs.c);
}

C++'s standard library already knows what to do with that, so you don't have to.


Now, how can your bug lead to duplicate keys in a set?

Set's internal algorithm relies on the ordering being a strict weak ordering — when you break that precondition, you break the algorithms managing the internal tree, which is constructed and arranged using this ordering as its bible.

All hell breaks loose, basically. You could get a crash from this. In your case the symptoms were somewhat more benign (at least for now), with a deformed/mangled data tree resulting in the appearance of duplicated data.

It's folly to try to reason about the specific chain of events that led to a specific outcome, if you started off by breaking the preconditions and causing UB.


Community
  • 1
  • 1
Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • Thank you. I see ordering is a problem, but I don't yet understand the presence of duplicates. – Sindhura Bandi Mar 18 '15 at 12:09
  • 2
    @SindhuraBandi: I thought you would understand that already, since you already made the logical deduction to check the comparator. I have added the reason to the answer. – Lightness Races in Orbit Mar 18 '15 at 12:13
  • Yes, I was just trying to form an example sequence. I got it now, thanks a lot for a quick response. – Sindhura Bandi Mar 18 '15 at 12:36
  • @SindhuraBandi: Yeah unfortunately I can't reproduce it. It would be possible to derive, but only with a great deal of effort. :) – Lightness Races in Orbit Mar 18 '15 at 12:41
  • My favorite is `bool operator<(const Key& rhs) const {if (a!=rhs.a)return a – Mooing Duck Mar 18 '15 at 20:29
  • @MooingDuck That relies on `!=`. – T.C. Mar 18 '15 at 20:49
  • 1
    @LightnessRacesinOrbit The point is that the standard usually relies only on `<` on the elements for a lexicographical comparison and not other comparison operators (e.g., `operator<` for tuples, `std::lexicographical_compare`, etc.). Of course, since these are integers, it makes no difference here. – T.C. Mar 19 '15 at 05:25
9

Your operator<() is not consistent, as key1<key2 and key2<key1 could both be true (example: key1={1,3,0}, key2={3,1,0}). You should give the member variables a precedence in comparison:

    if (a < rhs.a) {
        return true;
    } else if (a == rhs.a) {
        if (b < rhs.b) {
            return true;
        } else if (b == rhs.b) {
            if (c < rhs.c) {
                return true;
            }
        }
    }
    return false;
jhnnslschnr
  • 404
  • 2
  • 9
  • Thanks for clarifying. I see that my compare function has some ordering problems. But is it possible for it to accept duplicates? – Sindhura Bandi Mar 18 '15 at 11:02
  • [cpp reference](http://www.cplusplus.com/reference/set/set/) states that equality is checked by `!comp(a,b) && !comp(b,a)`, which should avoid duplicates with your `operator<()`, too. No idea why you have seen duplicates. – jhnnslschnr Mar 18 '15 at 11:13
  • 1
    @jhnnslschnr: It's UB unless the ordering meets the preconditions. Your assertion regarding `!comp(a,b) && !comp(b,a)` is _false_, because with the broken ordering it is possible for `comp(a,b) == comp(b,a)`, which is clearly nonsensical. See my answer. – Lightness Races in Orbit Mar 18 '15 at 12:20
  • @LightnessRacesinOrbit Ok, UB is a fair point, anything could happen. – jhnnslschnr Mar 18 '15 at 12:25
  • um nonsensical for non-equal `a` and `b`, I mean :D – Lightness Races in Orbit Mar 18 '15 at 12:27
  • Well UB isn't a magic unicorn - every resulting symptom you see will have a practical reason, it's just that in order to explain it you'd have to dive deep into the internal algorithms and piece together every piece of memory used, the value of memory at that time under every out-of-bounds array access, the precise order of every single internal operation, .... and it's just so complex that it's never worth doing. So we abstract it away and say "anything can happen". :) – Lightness Races in Orbit Mar 18 '15 at 12:29
  • @jhnnslschnr cpp reference is cppreference.com, you're linking to cplusplus.com. – Cubbi Mar 18 '15 at 14:55
2

You could indeed use standard class std::tuple as the key.

Nevertheless the operator can be defined the following way

bool operator <( const Key &rhs ) const
{
    return
    ( a < rhs.a ) ||
    ( !( rhs.a < a ) && ( b < rhs.b ) ) ||
    ( !( rhs.a < a ) && !( rhs.b < b ) && ( c < rhs.c ) );
};

That this operator would work all you need is that for the type of objects a, b, and c there would be defined operator < Of course for arithmetic types it is already defined.

In fact it is the same as

#include <tuple>

//...

bool operator <( const Key &rhs ) const
{
    return std::tie( a, b, c ) < std::tie( rhs.a, rhs.b, rhs.c );
}
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335