1

I was trying to run the below code. What I am finding is that there is difference in the output. I understand that there is an issue with ordering mechanism used in the Comparator functionality. What I am basically looking for is: 1) How does Set internally stores data. 2) How can I resolve this issue or the best way to copy the data to a different Set. 3) How exactly the ordering is creating this issue.

#include <iostream>
#include <set>
using namespace std;
struct Comparator {
  bool operator()( const int& a, const int& b ) {
    if( a <= b ) 
      return true;
    else
      return false;
  }
};
int main()
{
  set< int, Comparator > customSet;
  for( unsigned k = 0, index = 2; k < 10; ++k ) {
    customSet.insert( index );
  }

  set< int, Comparator >::iterator iter = customSet.begin();
  for(; iter != customSet.end(); ++iter ) {
    cout<<*iter<<endl;
  }

  cout<<"---------------------------------"<<endl;
  set< int, Comparator > tempCustomSet ;//= customSet;
  tempCustomSet.insert( customSet.begin(), customSet.end() );

  iter = tempCustomSet.begin();
  for(; iter != tempCustomSet.end(); ++iter ) {
    cout<<*iter<<endl;
  }

  return 0;
}
TemplateRex
  • 69,038
  • 19
  • 164
  • 304
PhiberOptixz
  • 512
  • 6
  • 13
  • 2
    Your condition is unsafe. It should say, `if ((a <= b) == true ? true : false) { return a <= b ? true : false } else return {!true && false; }`. A common mistake. – Kerrek SB Jan 29 '13 at 12:43

4 Answers4

2

See this reference for more details on std::set. The implementation should not concern you (they can differ from platform to platform), since the interface and complexity guarantees is all that matters to the Standard. Typical implementations are using red-black-trees.

You need to make your Comparator use operator< and not operator<=. The reason is that std::set will consider elements equivalent if !Comparator(a, b) && !Comparator(b, a) evaluates to true (i.e. neither is strictly less than the other).

But with <=, you have a <= a equal to true, so that !(a<=a) && !(a<=a) gives false for equal elements. Whereas with < you have a < a equal to false so !(a<a) && !(a<a) gives true.

The right to thing to do is:

struct Comparator 
{
    bool operator()(int const& lhs, int const& rhs) const 
    {
        return lhs < rhs; 
    }
};

This will guarantee that equal elements are considered equivalent. Note that this is discussed at length in Effective STL, "Item 19. Understand the difference between equality and equivalence."

TemplateRex
  • 69,038
  • 19
  • 164
  • 304
  • But say I have a business logic that needs the data to be inserted in that fashion/order in the Set. – PhiberOptixz Jan 29 '13 at 12:46
  • 1
    @user469258 you can still use your own comparison, but you need to make sure that equal elements are equivalent. – TemplateRex Jan 29 '13 at 12:53
  • @user469258: your business logic needs are (unfortunately) irrelevant, it is *strict weak ordering* or the highway. – Matthieu M. Jan 29 '13 at 12:58
  • @MatthieuM. for any `a <= b` you can always define `a < b` as `!(b <= a)`, so his business logic can always be safely mapped to `std::set`, even if he would not have access to the code defining `operator<=`. – TemplateRex Jan 29 '13 at 13:01
  • @rhalbersma: which is exactly `a > b`, or a reverse-ordering. What I meant was simply that whatever the business logic you *first* had to conform to the `set` expectations (if you used a `set`, obviously). – Matthieu M. Jan 29 '13 at 13:02
  • @MatthieuM. you are right, but my point was that it's simple to convert non-strict to strict order, without knowing anything about business logic. – TemplateRex Jan 29 '13 at 13:05
2

The problem is most likely because your comparison does not implement a strict weak ordering. The internal ordering mechanism on the set relies on this. You can get SWO by changing your comparison to a less-than:

struct Comparator {
  bool operator()( const int& a, const int& b ) const {
    return ( a < b ); 
  }
};

On the other hand, std::set will use this comparison criteria by default so you need not specify it.

There is some related information in my answer to this question ( and zillions of other SO questions).

Community
  • 1
  • 1
juanchopanza
  • 223,364
  • 34
  • 402
  • 480
2

1) How does Set internally stores data

The only requirements are that the elements are:

  • ordered according to the comparator, so that if Comp(a,b), then a appears before b when iterating the set;
  • unique, so there are no distinct elements for which both Comp(a,b) and Comp(b,a) hold.

and that the operations meet certain complexity requirements.

In practice, they are typically stored in a binary search tree; but that doesn't matter to a user.

2) How can I resolve this issue or the best way to copy the data to a different Set

In order to meet the requirements, the comparator must be a strict weak ordering, like <, so that Comp(a,a) is always false, rather than a non-strict ordering like <=. Since < is the default, that means you don't need a custom comparator at all.

3) How exactly the ordering is creating this issue

Note that your first loop is inserting the value 2 ten times; I'm not sure whether that's the intention or not.

Given the required strict ordering, insert(b) might look for an insertion point by finding the first element a such that Comp(a,b) is false; i.e. the first element that b should not come after. It will then check for uniqueness by checking Comp(b,a). If both are false, then that indicates that the two values are equivalent, so b will not be inserted.

Since your comparison is not strict, this uniqueness test might fail; so you might end up with a duplicate entry. Or something else might happen - the behaviour is undefined.

Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
  • But Mike for the 3) answer how come during the insertion for the first set it was able to do that properly. That means if you look at the out out all the ten elements were printed( the first print function ). In that case also 2 was duplicated, that means should not have been inserted. – PhiberOptixz Jan 29 '13 at 13:17
  • @user469258: Because undefined behaviour is undefined. For me it is not "able to do that properly" - running your code inserts the value 2 multiple times, which certainly is not proper behaviour for a set, and is a direct result of the comparator breaking the set's assumptions. But there's no requirement that the set fail to work in any particular way if the comparator is not strict; depending on how it's implemented, some operations might sometimes behave as they would with a correctly strict ordering. – Mike Seymour Jan 29 '13 at 13:22
  • I do agree to the point. But I am still unclear that how the Set is working for the first time. But when I take the temporary set and try inserting the elements taking it from the original Set it fails. If the ordering has to fail it should fail in the first place itself. That means while printing the elements of the first set, it should print "2" only twice. – PhiberOptixz Jan 29 '13 at 14:37
  • @user469258: An invalid ordering doesn't "have to fail". The set implementation makes assumptions that the ordering is valid; if it isn't, then it has undefined behaviour, meaning that it could do anything. It's certainly not surprising that you might get different behaviour when inserting individual items than when inserting a range. And as I said, it's not "working" the first time, at least for me - I get ten copies of 2, when a "working" set should only contain one. – Mike Seymour Jan 29 '13 at 15:09
0

You are getting different outputs in two cases because you are inserting in different ways. In case 1 you are inserting element 2 ten times. In this case, when you insert integer 2 after first time, your Comparator() function will be called to decide where to insert. In the other case, you are inserting a range. Here, called function takes first argument, i,e customSet.begin() and checks it with the other argument i,e customSet.end(), if these two are not equal then only an element is inserted otherwise element will not be inserted.

Vinay
  • 188
  • 9