28

I am trying to prove Clojure performance can be on equal footing with Java. An important use case I've found is the Quicksort. I have written an implementation as follows:

(set! *unchecked-math* true)

(defn qsort [^longs a]
  (let [qs (fn qs [^long low, ^long high]
             (when (< low high)
               (let [pivot (aget a low)
                     [i j]
                     (loop [i low, j high]
                       (let [i (loop [i i] (if (< (aget a i) pivot)
                                             (recur (inc i)) i))
                             j (loop [j j] (if (> (aget a j) pivot)
                                             (recur (dec j)) j))
                             [i j] (if (<= i j)
                                     (let [tmp (aget a i)]
                                       (aset a i (aget a j)) (aset a j tmp)
                                       [(inc i) (dec j)])
                                     [i j])]
                         (if (< i j) (recur i j) [i j])))]
                 (when (< low j) (qs low j))
                 (when (< i high) (qs i high)))))]
    (qs 0 (dec (alength a))))
  a)

Also, this helps call the Java quicksort:

(defn jqsort [^longs a] (java.util.Arrays/sort a) a))

Now, for the benchmark.

user> (def xs (let [rnd (java.util.Random.)] 
        (long-array (repeatedly 100000 #(.nextLong rnd)))))
#'user/xs
user> (def ys (long-array xs))
#'user/ys
user> (time (qsort ys))
"Elapsed time: 163.33 msecs"
#<long[] [J@3ae34094>
user> (def ys (long-array xs))
user> (time (jqsort ys))
"Elapsed time: 13.895 msecs"
#<long[] [J@1b2b2f7f>

Performance is worlds apart (an order of magnitude, and then some).

Is there anything I'm missing, any Clojure feature I may have used? I think the main source of performance degradation is when I need to return several values from a loop and must allocate a vector for that. Can this be avoided?

BTW running Clojure 1.4. Also note that I have run the benchmark multiple times in order to warm up the HotSpot. These are the times when they settle down.

Update

The most terrible weakness in my code is not just the allocation of vectors, but the fact that they force boxing and break the primitive chain. Another weakness is using results of loop because they also break the chain. Yep, performance in Clojure is still a minefield.

Community
  • 1
  • 1
Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
  • Is your implementation equivalent to `Arrays.sort`? – ponzao Aug 29 '12 at 12:43
  • @ponzao Pretty much, except for the way the pivot is chosen. But that can't possibly account for this difference. Please help yourself to the [Java code](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#486) in question. – Marko Topolnik Aug 29 '12 at 12:46
  • @ponzao Checked some more; since I'm benchmarking with a random-filled array, taking the first element for the pivot is perfectly fine. I'm not getting into the nitty-gritty of worst-case datasets here. – Marko Topolnik Aug 29 '12 at 12:54
  • fyi, Java SE 7 and later use TimSort for sorting arrays, which gives marginally better performance on real world data than Quicksort. See http://en.wikipedia.org/wiki/Timsort – sw1nn Aug 29 '12 at 14:26
  • No, @sw1nn it's still [quicksort](http://stackoverflow.com/a/4018380/1103872). – Marko Topolnik Aug 29 '12 at 16:10

4 Answers4

44

This version is based on @mikera's, is just as fast and doesn't require the use of ugly macros. On my machine this takes ~12ms vs ~9ms for java.util.Arrays/sort:

(set! *unchecked-math* true)
(set! *warn-on-reflection* true)

(defn swap [^longs a ^long i ^long j]
  (let [t (aget a i)]
    (aset a i (aget a j))
    (aset a j t)))

(defn ^long apartition [^longs a ^long pivot ^long i ^long j]
  (loop [i i j j]
    (if (<= i j)
      (let [v (aget a i)]
        (if (< v pivot)
          (recur (inc i) j)
          (do 
            (when (< i j) 
              (aset a i (aget a j))
              (aset a j v))
            (recur i (dec j)))))
      i)))

(defn qsort 
  ([^longs a]
     (qsort a 0 (long (alength a))))
  ([^longs a ^long lo ^long hi]    
     (when
         (< (inc lo) hi)
       (let [pivot (aget a lo)
             split (dec (apartition a pivot (inc lo) (dec hi)))]
         (when (> split lo)
           (swap a lo split))
         (qsort a lo split)
         (qsort a (inc split) hi)))
     a))

(defn ^longs rand-long-array []
  (let [rnd (java.util.Random.)] 
    (long-array (repeatedly 100000 #(.nextLong rnd)))))

(comment
  (dotimes [_ 10]
    (let [as (rand-long-array)]
      (time
       (dotimes [_ 1] 
         (qsort as)))))
  )

The need for manual inlining is mostly unnecessary starting with Clojure 1.3. With a few type hints only on the function arguments the JVM will do the inlining for you. There is no need to cast index arguments to int for the the array operations - Clojure does this for you.

One thing to watch out for is that nested loop/recur does present problems for JVM inlining since loop/recur doesn't (at this time) support returning primitives. So you have to break apart your code into separate fns. This is for the best as nested loop/recurs get very ugly in Clojure anyhow.

For a more detailed look on how to consistently achieve Java performance (when you actually need it) please examine and understand test.benchmark.

dnolen
  • 18,496
  • 4
  • 62
  • 71
  • I don't think I agree with your analysis---loop/recur compiles into plain looping bytecode. I don't see anything stopping `loop` from returning a primitive. – Marko Topolnik Aug 29 '12 at 17:09
  • I changed the language. I didn't mean to imply that it was impossible for loop/recur to return primitives. Simply that it does not currently support it. – dnolen Aug 29 '12 at 17:12
  • 1
    Yes, this is then another issue with my code. I wouldn't necessarily mind the nesting, though, on the code style level. Performance is still a minefield in Clojure. – Marko Topolnik Aug 29 '12 at 17:23
  • 2
    Now that I disagree with. It's in many respects simpler and involves less voodoo than most people think. If code has many type-hints and macros, and I see this quite a bit, it's quite likely that it won't be optimal. In anycase, as I said, test.benchmark is worthy of study of the handful of techniques for >= Clojure 1.3 – dnolen Aug 29 '12 at 17:27
  • Well, using a loop expecting to return a primitive is definitely one of the mines. Another is acquiring more than value from the loop. At least with the second one is aware of vectors being involved. – Marko Topolnik Aug 29 '12 at 17:30
  • Why did you expect loop/recur to able return a primitive? Prior to 1.3 there were no guarantees about taking & returning primitives outside explicit cast and Java interop. The only that changed >= 1.3 is that fns can take & return primitives. – dnolen Aug 29 '12 at 17:33
  • That expectation flows naturally from the way primitives are used within the loop and is additionally reinforced by the analogy, almost equivalence, between loops and fns. `(loop [i i] ...)` is `((fn [i] ...) i)`. `recur` works the same way in both. – Marko Topolnik Aug 29 '12 at 17:36
  • Seems like a strange expectation, loop is a binding construct - let can't return primitives either. – dnolen Aug 29 '12 at 17:48
  • Well, an `fn` is a binding construct, too. Your thinking can be had only by a person actually familiar in advance with the implementation details. I have always considered `loop` as just a shorthand for the combination of lambda declaration and evaluation. Quite similar to `let`, in fact. This is the way it works in Scheme. – Marko Topolnik Aug 29 '12 at 17:50
  • BTW I'm looking at *test.benchmark*, but from a cursory overview I didn't get any new hints that wouldn't be exposed elsewhere as explicit optimization advice. Maybe you can make some explicit references to tricks contained there. – Marko Topolnik Aug 29 '12 at 17:57
  • I don't know anything about the implementation details. The only thing I know is that only functions can take and return primitives without casting and this is explicitly documented. Nothing is documented about the other binding forms beyond type hint information flowing *internally*. – dnolen Aug 29 '12 at 17:58
  • Anyway, good that you reminded me of `let` not working with primitives either---yet another argument for my "minefield" characterization. One must stay deeply focused on what basic language constructs are and aren't allowed in optimized code. That can't be good and I'm quite sure it will change in the future. – Marko Topolnik Aug 29 '12 at 18:03
  • @dnolen/@Marko - I'm 100% positive that both loop/recur and let work with primitives in both Clojure 1.3 and 1.4. It may or may not be documented like that but it definitely works - I can get a tight `loop`/`recur` with a `long` counter running in less than 1 ns per iteration which isn't possible with boxed numbers (it's only a couple of clock cycles...). The same goes with `let`. – mikera Aug 29 '12 at 19:32
  • @mikera You give an excellent, if unsuspected, contribution to my point: the primitives work **within** the loop-recur, but the **result** of the loop cannot be had as a primitive! That's dnolen's point, and he is right. He is wrong in stating that this is something that everyone should intuitively expect. – Marko Topolnik Aug 29 '12 at 20:09
  • @Marko - ah yes I see. Out of interest just tested in 1.4 and `let` does appear to return primitives just fine. As does `if`. It's just returning from a `loop/recur` that seems to lose the primitive nature . Certainly odd behaviour and not what I would expect (I'd expect it to use primitives whenever it is possible for the compiler to infer this). Aaagh this performance stuff is tricky..... – mikera Aug 29 '12 at 20:21
  • @mikera Thanks for checking, I also checked `loop-recur` as it is important for my qsort implementation, but didn't check `let`. It is indeed good to know that we can use `let` and a really awkward fact that we can't do the same with `loop`. – Marko Topolnik Aug 29 '12 at 20:24
  • @MarkoTopolnik I didn't mean to imply that the current behavior is ideal or intuitive. I was talking about what is and isn't documented. I just wouldn't go so far as to call it a "minefield" given that ;) Yes I'd like to see *all* the binding forms support returning primitives. – dnolen Aug 29 '12 at 20:28
  • 1
    And just to be fair to mikera, this solution is not faster than his. He and I tested it and came to a similar conclusion. They are at least at the same speed level, mikera's possibly having an edge. – Marko Topolnik Aug 30 '12 at 05:52
11

This is slightly horrific because of the macros, but with this code I think you can match the Java speed (I get around 11ms for the benchmark):

(set! *unchecked-math* true)

(defmacro swap [a i j]
  `(let [a# ~a
         i# ~i
         j# ~j
         t# (aget a# i#)]
     (aset a# i# (aget a# j#))
     (aset a# j# t#)))

(defmacro apartition [a pivot i j]
  `(let [pivot# ~pivot]
     (loop [i# ~i
            j# ~j]
       (if (<= i# j#)
         (let [v# (aget ~a i#)]
           (if (< v# pivot#)
             (recur (inc i#) j#)
             (do 
               (when (< i# j#) 
                 (aset ~a i# (aget ~a j#))
                 (aset ~a j# v#))
               (recur i# (dec j#)))))
         i#))))

(defn qsort 
  ([^longs a]
    (qsort a 0 (alength a)))
  ([^longs a ^long lo ^long hi]    
    (let [lo (int lo)
          hi (int hi)]
      (when
        (< (inc lo) hi)
        (let [pivot (aget a lo)
              split (dec (apartition a pivot (inc lo) (dec hi)))]
          (when (> split lo) (swap a lo split))
          (qsort a lo split)
          (qsort a (inc split) hi)))
      a)))

The main tricks are:

  • Do everything with primitive arithmetic
  • Use ints for the array indexes (this avoids some unnecessary casts, not a big deal but every little helps....)
  • Use macros rather than functions to break up the code (avoids function call overhead and parameter boxing)
  • Use loop/recur for maximum speed in the inner loop (i.e. partitioning the subarray)
  • Avoid constructing any new objects on the heap (so avoid vectors, sequences, maps etc.)
mikera
  • 105,238
  • 25
  • 256
  • 415
  • Internally Clojure is supposed to work only with `longs`, according to [the documentation on 1.3 Numerics](http://dev.clojure.org/display/doc/Documentation+for+1.3+Numerics). Did you really notice the difference between using (int lo) and not using it? – Marko Topolnik Aug 29 '12 at 13:59
  • I think Clojure does use ints as primitives within functions. See e.g. the code for `clojure.lang.RT.intCast`. Doesn't seem to make a noticeable difference to the timings though. I believe the restriction to longs applies mainly to primitives passed as function parameters. – mikera Aug 29 '12 at 14:04
  • I'm wondering... I already use primitive arithmetic; I don't have either function or macro calls; I use loop-recur everywhere. – Marko Topolnik Aug 29 '12 at 14:05
  • Do you base your solution off of [this post](https://groups.google.com/forum/?fromgroups=#!topic/clojure/c7HUFjXRndg) (or perhaps the same code, but posted elsewhere)? – Marko Topolnik Aug 29 '12 at 14:06
  • Nope I coded this implementation myself (written quicksort so many times I have the algorithm in my head....). I think the main performance issue in your code from a performance perspective is the use of vectors (`[i j]` etc.). Vectors are Objects so the construction/GC is costing you a lot. – mikera Aug 29 '12 at 14:12
  • Yes, that was my instinct as well... it's just the way I've organized the code. I'll try to do a minimal rewrite to avoid that one problem. – Marko Topolnik Aug 29 '12 at 14:19
  • dnolen has obviously profited much from your code and as such it is kind of unfair to accept his solution over yours. However, I'll do that in order to emphasize the cleanest solution offered to any future readers. – Marko Topolnik Aug 29 '12 at 17:28
  • 2
    @Marko - no worries, dnolen's code is definitely better from a style perspective and it would be better for most coders to emulate that answer. I still think my version is *marginally* faster though - I have it about 15% faster on my machine (Clojure 1.4, Java 7 in Eclipse Counterclockwise), YMMV. – mikera Aug 29 '12 at 19:34
  • Yes, in my measurements it seems so, too, and it is indeed the expected result. – Marko Topolnik Aug 29 '12 at 20:04
10

The Joy of Clojure, Chapter 6.4 describes a lazy quicksort algorithm.The beauty of lazy sorting is that it will only do as much work as necessary to find the first x values. So if x << n this algorithm is O(n).

(ns joy.q)

(defn sort-parts
  "Lazy, tail-recursive, incremental quicksort.  Works against
   and creates partitions based on the pivot, defined as 'work'."
  [work]
  (lazy-seq
   (loop [[part & parts] work]
     (if-let [[pivot & xs] (seq part)]
       (let [smaller? #(< % pivot)]
         (recur (list*
                 (filter smaller? xs)
                 pivot
                 (remove smaller? xs)
                 parts)))
       (when-let [[x & parts] parts]
         (cons x (sort-parts parts)))))))

(defn qsort [xs]
    (sort-parts (list xs))) 
Julien Chastang
  • 17,592
  • 12
  • 63
  • 89
  • 3
    These kinds of things are a bad pun on the name of the algorithm. OK, if I really needed the top 3 elements from an array of thousands, this may actually be useful. But as for array sorting, useless. – Marko Topolnik Aug 29 '12 at 16:12
  • The idea is neat! It's not very fast overall in the general case of wanting to sort an entire array. Still, +1 for sharing an interesting implementation of quicksort. – mikera Aug 29 '12 at 19:58
  • 1
    @mikera The book Joy of Clojure really is a great book, full of excellent and interesting stuff, even for seasoned Clojure programmers. This sort implementation is a beautiful example of concise Clojure code. I just hesitate to call it quicksort :) – Marko Topolnik Aug 31 '12 at 18:10
  • anybody has actually run this code? its performance is horrible. just a toy. – 象嘉道 Jul 03 '16 at 03:44
7

By examining the main points from mikera's answer, you can see that they are mostly focused on eliminating the overhead introduced by using idiomatic (as opposed to tweaked) Clojure, which would probably not exist in an idiomatic Java implementation:

  • primitive arithmetic - slightly easier and more idiomatic in Java, you are more likely to use ints than Integers
  • ints for the array indexes - the same
  • Use macros rather than functions to break up the code (avoids functional call overhead and boxing) - fixes a problem introduced by using the language. Clojure encourages functional style, hence a function call overhead (and boxing).
  • Use loop/recur for maximum speed in the inner loop - in Java you'd idiomatically use an ordinary loop (which is what loop/recur compiles to anyway, as far as I know)

That being said, there actually is another trivial solution. Write (or find) an efficient Java implementation of Quick Sort, say something with a signature like this:

Sort.quickSort(long[] elems)

And then call it from Clojure:

(Sort/quickSort elems)

Checklist:

  • as efficient as in Java - yes

  • idiomatic in Clojure - arguably yes, I'd say that Java-interop is one of Clojure's core features.

  • reusable - yes, there's a good chance that you can easily find a very efficient Java implementation already written.

I'm not trying to troll, I understand what you are trying to find out with these experiments I'm just adding this answer for the sake of completeness. Let's not overlook the obvious one! :)

Goran Jovic
  • 9,418
  • 3
  • 43
  • 75
  • Well, I already include a killer qsort implementation on your terms: `(defn jqsort [^longs a] (java.util.Arrays/sort a) a))`. – Marko Topolnik Aug 29 '12 at 14:15
  • @MarkoTopolnik: Exactly! Just bear in mind that Java's `Array.sort` for some reason uses merge sort when arguments are objects, and qsort for primitives. In your example, it is qsort since longs are primitive. – Goran Jovic Aug 29 '12 at 14:19
  • 1
    It is due to the requirement for a stable sort. Java 7 has upgraded this to a [new kind of sort](http://stackoverflow.com/questions/4018332/is-java-7-using-tim-sort-for-the-method-arrays-sort), BTW. – Marko Topolnik Aug 29 '12 at 14:21
  • 1
    @MarkoTopolnik: Ah I get it. Primitives only have one dimension, so stability doesn't really matter, since you can't repeatedly sort them on different criteria anyway. Thanks! – Goran Jovic Aug 29 '12 at 14:26
  • 3
    Actually it's not that, but that "stable" doesn't even make sense for primitives. It is defined against objects that are all distinct, but some compare as equal according to some order. Primitive values don't have identity; they are just values. – Marko Topolnik Aug 29 '12 at 16:15