25

I recently saw a discussion in an SO chat but with no clear conclusions so I ended up asking there.

Is this for historical reasons or consistency with other languages? When looking at the signatures of compareTo of various languages, it returns an int.

Why it doesn't return an enum instead. For example in C# we could do:

enum CompareResult {LessThan, Equals, GreaterThan};

and :

public CompareResult CompareTo(Employee other) {
    if (this.Salary < other.Salary) {
         return CompareResult.LessThan;
    }
    if (this.Salary == other.Salary){
        return CompareResult.Equals;
    }
    return CompareResult.GreaterThan;
}

In Java, enums were introduced after this concept (I don't remember about C#) but it could have been solved by an extra class such as:

public final class CompareResult {
    public static final CompareResult LESS_THAN = new Compare();
    public static final CompareResult EQUALS = new Compare();
    public static final CompareResult GREATER_THAN = new Compare();

    private CompareResult() {}
}  

and

interface Comparable<T> {
    Compare compareTo(T obj);
}

I'm asking this because I don't think an int represents well the semantics of the data.

For example in C#,

l.Sort(delegate(int x, int y)
        {
            return Math.Min(x, y);
        });

and its twin in Java 8,

l.sort(Integer::min);

compiles both because Min/min respect the contracts of the comparator interface (take two ints and return an int).

Obviously the results in both cases are not the ones expected. If the return type was Compare it would have cause a compile error thus forcing you to implement a "correct" behavior (or at least you are aware of what you are doing).

A lot of semantic is lost with this return type (and potentially can cause some difficult bugs to find), so why design it like this?

user2336315
  • 15,697
  • 10
  • 46
  • 64
  • 2
    Probably because simple comparison operators can be used with `-1` `0` and `1`. Like `<`, `>`, `>=`, `<=` and so on – MinecraftShamrock Mar 30 '15 at 09:54
  • @MinecraftShamrock In Java there's `Integer.compare` which can do that for you, so if it's in the standard library, it's just a method call as any other. – user2336315 Mar 30 '15 at 09:56
  • 6
    This is a comment because it refers not to the question which is sufficiently answered by Matthew Watson, but to the mentioned "trick": **DO NOT USE SUBTRACTION TO IMPLEMENT COMPARE WITH NUMERIC VALUES** !!! It works most of the time which makes it so dangerous, but it is inherently flawed. Try to "compare" e.g. 2^31-1 and -2^31 for 32bit signed integers. The result causes a positive/negative overflow and is wrapped around, giving a wrong result ! The assembly cmp instruction checks for the flags, but C derivates almost always do not give any indication that an integer overflow happened. – Thorsten S. Mar 30 '15 at 12:47
  • 6
    There is no fundamental reason. For example in Haskell comparisons return the Haskell equivalent of an enum. Haskell emphasizes correct data modelling. – usr Mar 30 '15 at 13:15
  • 1
    It may be a bit far fetched, but I think this may be related: http://stackoverflow.com/questions/29131376/why-is-there-no-icmp-instruction – Marco13 Mar 30 '15 at 22:44
  • Is this a *keyboard* or a *whiteboard* question? What blocking implementation problem do you have with using `int` return type? Is your question about *what is the best way to code comparison methods*, *why a language was designed in a specific way* or *how to solve a particular problem with your code*? –  Mar 30 '15 at 22:46
  • @vaxquis Was just curious why it was defined like that. Isn't was clear in my question? – user2336315 Apr 01 '15 at 10:50
  • @user2336315 pure curiosity usually constitutes a whiteboard question - in that case, the question should probably be asked on Programmers.SE, not SO, as per meta consensus http://meta.stackoverflow.com/questions/254570/choosing-between-stack-overflow-and-programmers-stack-exchange `If it is related to coding, it should be on SO. If it's related to higher level programming concepts or is conceptual (but still related to programming), it should be on P.SE` - AFAIK that's the main reason for having your question closed here on SO. –  Apr 01 '15 at 19:45
  • @vaxquis Thanks I didn't know it would have been more suited for programmers, next time I'll think about it. It's not a problem that it has been closed as long as I have satisfactory answers. The funny things is that the downvote button highlights _"This question does not show any research effort; it is unclear and not useful"_ I don't think it applies to mine anyway (though I could understand the close vote). – user2336315 Apr 02 '15 at 08:44

5 Answers5

22

[This answer is for C#, but it probably also apples to Java to some extent.]

This is for historical, performance and readability reasons. It potentially increases performance in two places:

  1. Where the comparison is implemented. Often you can just return "(lhs - rhs)" (if the values are numeric types). But this can be dangerous: See below!
  2. The calling code can use <= and >= to naturally represent the corresponding comparison. This will use a single IL (and hence processor) instruction compared to using the enum (although there is a way to avoid the overhead of the enum, as described below).

For example, we can check if a lhs value is less than or equal to a rhs value as follows:

if (lhs.CompareTo(rhs) <= 0)
    ...

Using an enum, that would look like this:

if (lhs.CompareTo(rhs) == CompareResult.LessThan ||
    lhs.CompareTo(rhs) == CompareResult.Equals)
    ...

That is clearly less readable and is also inefficient since it is doing the comparison twice. You might fix the inefficiency by using a temporary result:

var compareResult = lhs.CompareTo(rhs);

if (compareResult == CompareResult.LessThan || compareResult == CompareResult.Equals)
    ...

It's still a lot less readable IMO - and it's still less efficient since it's doing two comparison operations instead of one (although I freely admit that it is likely that such a performance difference will rarely matter).

As raznagul points out below, you can actually do it with just one comparison:

if (lhs.CompareTo(rhs) != CompareResult.GreaterThan)
    ...

So you can make it fairly efficient - but of course, readability still suffers. ... != GreaterThan is not as clear as ... <=

(And if you use the enum, you can't avoid the overhead of turning the result of a comparison into an enum value, of course.)

So this is primarily done for reasons of readability, but also to some extent for reasons of efficiency.

Finally, as others have mentioned, this is also done for historical reasons. Functions like C's strcmp() and memcmp() have always returned ints.

Assembler compare instructions also tend to be used in a similar way.

For example, to compare two integers in x86 assembler, you can do something like this:

CMP AX, BX ; 
JLE lessThanOrEqual ; jump to lessThanOrEqual if AX <= BX

or

CMP AX, BX
JG greaterThan ; jump to greaterThan if AX > BX

or

CMP AX, BX
JE equal      ; jump to equal if AX == BX

You can see the obvious comparisons with the return value from CompareTo().

Addendum:

Here's an example which shows that it's not always safe to use the trick of subtracting the rhs from the lhs to get the comparison result:

int lhs = int.MaxValue - 10;
int rhs = int.MinValue + 10;

// Since lhs > rhs, we expect (lhs-rhs) to be +ve, but:

Console.WriteLine(lhs - rhs); // Prints -21: WRONG!

Obviously this is because the arithmetic has overflowed. If you had checked turned on for the build, the code above would in fact throw an exception.

For this reason, the optimization of suusing subtraction to implement comparison is best avoided. (See comments from Eric Lippert below.)

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • You could always use `Set Le = EnumSet.of(Diff.Eq, Diff.Lt)` but that would be just as ugly. – OldCurmudgeon Mar 30 '15 at 10:25
  • 3
    You don't need two comparisons: `lhs.CompareTo(rhs) != CompareResult.GreaterThan` – raznagul Mar 30 '15 at 12:14
  • 2
    I note that the "use subtraction" trick is often dangerously broken. For many integral types you can find two numbers such that `x < y` and `x - y > 0`. – Eric Lippert Mar 30 '15 at 15:02
  • @EricLippert I've added an addendum to demonstrate this. – Matthew Watson Mar 30 '15 at 15:41
  • 3
    @MatthewWatson: Nice example! I personally would amend your suggestion though, to simply "using subtraction to implement comparison is best avoided." First off, it is only an *optimization* if it is both *correct* and *better performing* than some alternative, neither of which is typically true of the subtraction trick. No one has yet written a program whose dramatic success in the marketplace is due to its use of subtraction instead of comparison to implement comparison. And second, even when the values are of appropriate size, it's still not a great idea. – Eric Lippert Mar 30 '15 at 15:45
  • @EricLippert: presumably people are so fond of it because it works when implementing `strcmp` in C, where the bytes you're comparing are smaller than `int` and hence there can be no overflow. The fact that far more people are aware of it, than people have ever implemented `strcmp`, speaks to the priorities of C tutorials ;-) – Steve Jessop Mar 30 '15 at 19:24
  • @SteveJessop: Indeed. And just to be pedantic, the chars you are comparing are *almost always* smaller than integers, but they need not be. A conforming implementation of C could for instance decide that chars were Unicode UTF-32 codes the same size as an integer. – Eric Lippert Mar 30 '15 at 20:04
  • `if (lhs.CompareTo(rhs) != CompareResult.GreaterThan)` fails if they later add a `CompareResult.Unordered`. (Of course, with the current system they can't do that...) – user253751 Mar 30 '15 at 21:29
  • `Using an enum, that would look like this: if (lhs.CompareTo(rhs) == CompareResult.LessThan || lhs.CompareTo(rhs) == CompareResult.Equals) ...` - nope. You could just define an interface with explicit `lessThan()`, `greaterThan()` & `equals()`, and implement it to just have `if ( lhs.lessThan(rhs) ) {}` etc. –  Mar 30 '15 at 22:48
  • @EricLippert: yes, I was intentionally ambiguous "it works ... where the bytes are smaller" for that reason ;-) The fact that EOF is equal to a possible byte value read from a binary stream means AIUI that certain standard functions (`getc`) aren't entirely usable on such an implementation. – Steve Jessop Mar 30 '15 at 23:16
  • @SteveJessop: Plainly we are on the same page here. And plainly C is one messed-up language. :-) – Eric Lippert Mar 31 '15 at 00:02
  • @vaxquis Of course one could do that, but here we're comparing returning an enum with returning an int. Returning an interface instead is probably not at all appropriate for something which is supposed to be an efficient operation. – Matthew Watson Mar 31 '15 at 07:44
  • @MatthewWatson in Java, returning an enum and returning an instance of a class implementing an interface casted to interface is, AFAIR, exactly the same in terms of efficiency - you're just returning a reference to an object, since enums (in Java) are just that, classes, with immutable JVM-generated objects, and are referenced like that - they ain't ints, like in C or (AFAIK) C# (though they have an `ordinal` field, which allows them to be used as such). –  Mar 31 '15 at 17:10
  • @vaxquis That is not true for C#, where enums are simple value types based on primitive integral types. My answer here was for C#, not Java. – Matthew Watson Mar 31 '15 at 20:14
  • So if you'd use `enum CompareResult {LessThan = -1, Equal = 0, GreaterThan = 1}` every argument in favor of the ints you've mentioned is gone. You can still use relational operators with enums (`CompareResult.LessThan <= 0`) and while not recommended you could still use `(lhs - rhs)` for comparison since it's castable to the enum (even though the value is not defined). Compatibility with older languages is also guaranteed because you are in fact still using an int value. This certainly applies to C#, no idea about Java. – Didii Mar 01 '18 at 12:53
5

Let's stick to bare facts, with absolute minumum of handwaving and/or unnecessary/irrelevant/implementation dependent details.

As you already figured out yourself, compareTo is as old as Java (Since: JDK1.0 from Integer JavaDoc); Java 1.0 was designed to be familiar to C/C++ developers, and mimicked a lot of it's design choices, for better or worse. Also, Java has a backwards compatibility policy - thus, once implemented in core lib, the method is almost bound to stay in it forever.

As to C/C++ - strcmp/memcmp, which existed for as long as string.h, so essentially as long as C standard library, return exactly the same values (or rather, compareTo returns the same values as strcmp/memcmp) - see e.g. C ref - strcmp. At the time of Java's inception going that way was the logical thing to do. There weren't any enums in Java at that time, no generics etc. (all that came in >= 1.5)

The very decision of return values of strcmp is quite obvious - first and foremost, you can get 3 basic results in comparison, so selecting +1 for "bigger", -1 for "smaller" and 0 for "equal" was the logical thing to do. Also, as pointed out, you can get the value easily by subtraction, and returning int allows to easily use it in further calculations (in a traditional C type-unsafe way), while also allowing efficient single-op implementation.

If you need/want to use your enum based typesafe comparison interface - you're free to do so, but since the convention of strcmp returning +1/0/-1 is as old as contemporary programming, it actually does convey semantic meaning, in the same way null can be interpreted as unknown/invalid value or a out of bounds int value (e.g. negative number supplied for positive-only quality) can be interpreted as error code. Maybe it's not the best coding practice, but it certainly has its pros, and is still commonly used e.g. in C.

On the other hand, asking "why the standard library of language XYZ does conform to legacy standards of language ABC" is itself moot, as it can only be accurately answered by the very language designed who implemented it.

TL;DR it's that way mainly because it was done that way in legacy versions for legacy reasons and POLA for C programmers, and is kept that way for backwards-compatibility & POLA, again.

As a side note, I consider this question (in its current form) too broad to be answered precisely, highly opinion-based, and borderline off-topic on SO due to directly asking about Design Patterns & Language Architecture.

  • That said, I love your answer and think you're wrong that this is a bad question. It's not too broad - it asks a very specific question. And it's clearly not opinion based, as you've provided a solid answer that doesn't use opinion whatsoever. So while I agree with Paulo that if you think a question is bad, don't answer it, I'm glad you did because you're wrong about it being bad. – corsiKa Mar 30 '15 at 20:17
  • 2
    @PaŭloEbermann your reasoning is faulty. I answered the question because OP asked it; I flagged it because I had to do a liberty of reinterpreting the question to be even able to answer it. I downvoted it, because it doesn't help with coding nor debugging - IMO this is a great question for Programmers.SE, but not for SO. As such, I see no reason why I shouldn't answer the questions I consider borderline here. Not answering a wrong question is always worse than trying to answer it - you can't prove that a question is flawed in any other way than by trying to answer it. –  Mar 30 '15 at 22:22
  • @corsiKa thank you for your kind words, yet we can safely agree that we disagree on the validness of this question. Q: *why was computer language designed in an XYZ (insert any specific) way* is, IMO, blatantly off-topic on SO, and completely valid on e.g. Programmers.SE; the swarm of "guessing" answers is a direct result of the ambiguity of the question. I, for one, tried to focus on answering the `Is this for historical reasons or consistency with other languages?` part. –  Mar 30 '15 at 22:27
  • That doesn't add up. If you're saying it's off topic, then it's off topic. But you said it was too broad and primarily opinion based. Thus, it shouldn't be migrated ever. But again, the design of an API is completely within the realm of SO. And completely within the realm of Programmers. It's an area of overlap between the two. – corsiKa Mar 30 '15 at 22:31
  • 1
    @corsiKa http://meta.stackoverflow.com/questions/254570/choosing-between-stack-overflow-and-programmers-stack-exchange `If it is related to coding, it should be on SO. If it's related to higher level programming concepts or is conceptual (but still related to programming), it should be on P.SE` it's just that this is an interesting question by itself, it's just that it can't be precisely answered in Q/A format IMO, without going into *I think it's because...* and *it's my opinion that...* - also, Programmers have a different policy on valid "broadness" and "opinion-ness" of the questions –  Mar 30 '15 at 22:37
1

This practice comes from comparing integers this way, and using a subtract between first non-matching chars of a string.

Note that this practice is dangerous with things that are partially comparable while using a -1 to mean that a pair of things was incomparable. This is because it could create a situation of a < b and b < a (which the application might use to define "incomparable"). Such a situation can lead to loops that don't terminate correctly.

An enumeration with values {lt,eq,gt,incomparable} would be more correct.

Rob
  • 1,387
  • 1
  • 13
  • 18
-2

Reply this is due to performance reasons. If you need to compare int as often happens you can return the following:

Infact comparison are often returned as substractions.

As an example

public class MyComparable implements Comparable<MyComparable> {
    public int num;

    public int compareTo(MyComparable x) {
        return num - x.num;
    }
}
Davide Lorenzo MARINO
  • 26,420
  • 4
  • 39
  • 56
-2

My understanding is that this is done because you can order the results (i.e., the operation is reflexive and transitive). For example, if you have three objects (A,B,C) you can compare A->B and B->C, and use the resulting values to order them properly. There is an implied assumption that if A.compareTo(B) == A.compareTo(C) then B==C.

See java's comparator documentation.

Travis
  • 2,654
  • 4
  • 26
  • 46
  • The behavior you describe could manifest from using the `enum` technique instead of integers, though, so this doesn't really answer the question. – corsiKa Mar 30 '15 at 20:18
  • 2
    Sorry, the assertion `A.compareTo(B) == A.compareTo(C)` → `B == C` only is valid for sets of up to two elements, and is **not** implied in a Comparable/Comparator specification. Maybe you are thinking of `A.compareTo(B) == 0` → `A.equals(B)`? (i.e. comparison is consistent to equals)? – Paŭlo Ebermann Mar 31 '15 at 07:39