6

I am trying to use a comparator to help sort a list of objects. I have a question about how exactly the comparator works and what it would be doing exactly in the following example:

private static Comparator<Student> comparator()
{
        return (Student a, Student b) ->
        {  
                return Integer.compare(complexOperation(a), complexOperation(b));
        }
}

As you can see above, there is a need to compare and sort students according to an integer rank returned by the complexOperation() method. As the name suggests, it is a heavy operation. Would the above approach be the most efficient? Or would it be better to essentially run through each student in the list I am trying to sort, perform the complexOperation() per student and store the result in a field in the Student object. Then the comparator would just do an:

Integer.compare(a.getRank(), b.getRank())

Would both these approaches be comparable or, due to the way the comparator works (perhaps compares the same object more than once with others hence running complexOperation() multiple times per Student during the compare), would it be faster to do the pre computation of the complexOperation() result in a student field?

The above would be called like so:

Collections.sort(students, comparator());

Hope that was clear!

Edit: Lets say, for the sake of it, it is not possible to add a field to the Student object (This is a toy problem for a more complex situation where I am not at liberty to modify the Student object). Would it still be better to perhaps create a custom Object with Student sitting inside with another field added rather than doing the complexOperation() right in the comparator? Or is there another way to approach the problem? I can think of creating a Hashmap that takes student id as key and the result of the complexOperation() as value and just creates/access that record within the comparator?

John Baum
  • 3,183
  • 11
  • 42
  • 90
  • @HovercraftFullOfEels I was specifically using the comparator as a sorting mechanism and would like to have it be as performant as possible. – John Baum Oct 01 '15 at 02:15
  • `(perhaps compares the same object more than once with others hence running complexOperation() multiple times per Student during the compare` - add a System.out.println(...) statement to the comparator to see how often it is invoked. Or add a counter of some kind that you can display after the comparator is finished. If the number of calls is greater than the elements being sorted then you know the complex operation if called more than once. Basic problem solving technique to display some output. – camickr Oct 01 '15 at 02:16
  • You're then asking about the optimization workings of the JVM, and it often does just this sort of thing for you if it determines that this would make things run more efficiently. – Hovercraft Full Of Eels Oct 01 '15 at 02:16

2 Answers2

7

Basically, you want to compare students by comparing some values that each maps to. This is usually done by

    static Comparator<Student> comparator()
    {
        return Comparator.comparing( Foo::complexOperation );
    }

However, since the function complexOperation is too expensive, we want to cache its results. We can have a general purpose utility method Function cache(Function)

    static Comparator<Student> comparator()
    {
        return Comparator.comparing( cache(Foo::complexOperation) );
    }

In general, it is better that the caller can supply a Map as the cache

public static <K,V> Function<K,V> cache(Function<K,V> f, Map<K,V> cache)
{
    return k->cache.computeIfAbsent(k, f);
}

We can use IdentityHashMap as the default cache

public static <K,V> Function<K,V> cache(Function<K,V> f)
{
    return cache(f, new IdentityHashMap<>());
}
ZhongYu
  • 19,446
  • 5
  • 33
  • 61
  • 2
    I'm wondering why Java's `Comparator.comparing` isn't doing this by default (or at least optional). In Python, `sorted` with `key` function does this. – tobias_k Feb 04 '16 at 11:01
5

On average, your sort algorithm will call complexOperation() method about log2N times for an array of N students. If the operation is really slow, you may be better off running it once for each student. This could bring an order of magnitude improvement for an array of 1,000 students.

However, you do not have to do it explicitly: you could make complexOperation(...) store the result for each student, and then return the cached value on subsequent requests:

private Map<Student,Integer> cache = new HashMap<Student,Integer>();

private int complexOperation(Student s) {
    // See if we computed the rank of the student before
    Integer res = cache.get(s);
    if (res != null) {
        // We did! Just return the stored result:
        return res.intValue();
    }
    ... // do the real computation here
    // Save the result for future invocations
    cache.put(s, result);
    return result;
}

Note that in order for this approach to work, Student class needs to implement hashCode and equals.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • @bayou.io Clearing of the cache needs to be done either explicitly, or through discarding the object that owns the cache. – Sergey Kalinichenko Oct 01 '15 at 02:22
  • the users of the `comparator()` method probably don't want to be bothered by that detail :) – ZhongYu Oct 01 '15 at 02:24
  • @JohnBaum Yes, it would be better to make a "holder" with an extra `int` field, especially for large sets of students, where the number of invocations goes up ten times or more. Object overhead represents a tiny cost in comparison with potential CPU savings. – Sergey Kalinichenko Oct 01 '15 at 02:28
  • @JohnBaum That's pretty much what I suggested above, except your approach uses student ID for the key, while I use `Student` directly, without extracting its ID (under the hood `equal` and `hashCode` could very well rely on the ID, though). Other than that, the two approaches are the same. – Sergey Kalinichenko Oct 01 '15 at 02:40
  • *FYI* It's all theoretical, and yeah caching will improve performance, at the cost of complicating your code, so unless you actually *have* a performance problem, don't bother. It the old adage about not tuning your code until you have a problem, because you may end up wasting your time on the wrong problem, so you don't have any time left to actually find and fix the real problem, if you even have one. See [Premature optimization](https://www.google.com/search?q=Premature+Optimization&ie=utf-8&oe=utf-8) – Andreas Oct 01 '15 at 02:48