I would like to sort a primitive Java Array using a custom Comparator, but I am getting a type error. I think the comparator
function is creating a Comparator<java.lang.Object>
rather than a Comparator<Long>
, but I can't figure out how to get around this.
Here is a minimal example:
x.core=> (def x (double-array [4 3 5 6 7]))
#'x.core/x
x.core=> (java.util.Arrays/sort x (comparator #(> %1 %2)))
ClassCastException [D cannot be cast to [Ljava.lang.Object; x.core/eval1524 (form-init5588058267991397340.clj:1)
I have tried adding different type hints to the comparator function, but frankly I am relatively new to the language and was basically just throwing darts.
I have deliberately simplified the example above to focus on the key question, which is a type error. In the sections below I try to give some more detail to motivate the question and demonstrate why I am using a custom Comparator.
Motivation
What I am trying to do is duplicate R's order
function, which works like this:
> x = c(7, 2, 5, 3, 1, 4, 6)
> order(x)
[1] 5 2 4 6 3 7 1
> x[order(x)]
[1] 1 2 3 4 5 6 7
As you can see it returns the permutation of indices that will sort its input vector.
Here is a working solution in Clojure:
(defn order
"Permutation of indices sorted by x"
[x]
(let [v (vec x)]
(sort-by #(v %) (range (count v)))))
x.core=> (order [7 2 5 3 1 4 6])
(4 1 3 5 2 6 0)
(Note that R is 1-indexed while Clojure is 0-indexed.) The trick is to sort one vector (namely the indices of x [0, 1, ..., (count x)]
by the vector x itself.
R vs. Clojure Performance
Unfortunately, I am bothered by the performance of this solution. The R solution is much faster:
> x = runif(1000000)
> system.time({ y = order(x) })
user system elapsed
0.041 0.004 0.046
Corresponding Clojure code:
x.core=> (def x (repeatedly 1000000 rand))
#'x.core/x
x.core=> (time (def y (order x)))
"Elapsed time: 2857.216452 msecs"
#'x.core/y
Primitive Arrays the Solution?
I discovered that primitive arrays tend to sort in comparable time to R:
> x = runif(1000000)
> system.time({ y = sort(x) })
user system elapsed
0.061 0.005 0.069
vs.
x.core=> (def x (double-array (repeatedly 1000000 rand)))
#'x.core/x
x.core=> (time (java.util.Arrays/sort x))
"Elapsed time: 86.827277 msecs"
nil
This is the motivation for my attempt to use a custom Comparator with the java.util.Arrays class. My hope is that the speed will be comparable to R.
I should add that I can use a custom Comparator with an ArrayList as shown below, but the performance was no better than my starting function:
(defn order2
[x]
(let [v (vec x)
compx (comparator (fn [i j] (< (v i) (v j))))
ix (java.util.ArrayList. (range (count v)))]
(java.util.Collections/sort ix compx)
(vec ix)))
Any help will be appreciated, even if you just want to give some general Clojure advice. I'm still learning the language and having a lot of fun doing it. :-)
Edit
Per Carcigenicate's answer below,
(defn order
[x]
(let [ix (int-array (range (count x)))]
(vec (-> (java.util.Arrays/stream ix)
(.boxed)
(.sorted (fn [i j] (< (aget x i) (aget x j))))
(.mapToInt
(proxy [java.util.function.ToIntFunction] []
(applyAsInt [^long d] d)))
(.toArray)))))
will work:
x.core=> (def x (double-array [5 3 1 3.14 -10]))
#'x.core/x
x.core=> (order x)
[4 2 1 3 0]
x.core=> (map #(aget x %) (order x))
(-10.0 1.0 3.0 3.14 5.0)
Unfortunately it's super slow. I guess primitives might not be the answer after all.