2

I'm sorting an array of objects. The objects have lots of fields but I only care about one of them. So, I wrote a comparator:

    Collections.sort(details, new Comparator<MyObj>() {
        @Override
        public int compare(MyObj d1, MyObj d2) {
            if (d1.getDate() == null && d2.getDate() == null) {
                return 0;
            } else if (d1.getDate() == null) {
                return -1;
            } else if (d2.getDate() == null) {
                return 1;
            }

            if (d1.getDate().before(d2.getDate())) return 1;
            else if (d1.getDate().after(d2.getDate())) return -1;
            else return 0;
        }
    });

From the perspective of my use case, this Comparator does all it needs to, even if I might consider this sorting non-deterministic. However, I wonder if this is bad code. Through this Comparator, two very distinct objects could be considered "the same" ordering even if they are unequal objects. I decided to use hashCode as a tiebreaker, and it came out something like this:

    Collections.sort(details, new Comparator<MyObj>() {
        @Override
        public int compare(MyObj d1, MyObj d2) {

            if (d1.getDate() == null && d2.getDate() == null) {
                return d1.hashCode();
            } else if (d1.getDate() == null) {
                return -1;
            } else if (d2.getDate() == null) {
                return 1;
            }

            if (d1.getDate().before(d2.getDate())) return 1;
            else if (d1.getDate().after(d2.getDate())) return -1;
            else return d1.hashCode() - d2.hashCode();
        }
    });

(what I return might be backwards, but that's is not important to this question)

Is this necessary?

EDIT: To anyone else looking at this question, consider using Google's ordering API. The logic above was replaced by:

 return Ordering.<Date> natural().reverse().nullsLast().compare(d1.getDate(), d2.getDate());
Jeremy
  • 5,365
  • 14
  • 51
  • 80

2 Answers2

8

Through this comparator, two very distinct objects could be considered "the same" ordering even if they are unequal objects.

That really doesn't matter; it's perfectly fine for two objects to compare as equal even if they are not "equal" in any other sense. Collections.sort is a stable sort, meaning objects that compare as equal come out in the same order they came in; that's equivalent to just using "the index in the input" as a tiebreaker.

(Also, your new Comparator is actually significantly more broken than the original. return d1.hashCode() is particularly nonsensical, and return d1.hashCode() - d2.hashCode() can lead to nontransitive orderings that will break Collections.sort, because of overflow issues. Unless both integers are definitely nonnegative, which hashCodes aren't, always use Integer.compare to compare integers.)

Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
2

This is only mostly important if the objects implement Comparable.

It is strongly recommended (though not required) that natural orderings be consistent with equals. This is so because sorted sets (and sorted maps) without explicit comparators behave "strangely" when they are used with elements (or keys) whose natural ordering is inconsistent with equals. In particular, such a sorted set (or sorted map) violates the general contract for set (or map), which is defined in terms of the equals method.

For example, if one adds two keys a and b such that (!a.equals(b) && a.compareTo(b) == 0) to a sorted set that does not use an explicit comparator, the second add operation returns false (and the size of the sorted set does not increase) because a and b are equivalent from the sorted set's perspective.

However, you're not doing that, you're using a custom Comparator, probably for presentation reasons. Since this sorting metric isn't inherently attached to the object, it doesn't matter that much.


As an aside, why not just return 0 instead of messing with the hashCodes? Then they will preserve the original order if the dates match, because Collections.sort is a stable sort. I agree with @LouisWasserman that using hashCode in this way can have potentially very bizarre consequences, mostly relating to integer overflow. Consider the case where d1.hashCode() is positive and d2.hashCode() is negative, and vice versa.

Community
  • 1
  • 1
durron597
  • 31,968
  • 17
  • 99
  • 158