3

I'm quite fond of the story about Frank Nelson Cole, who in 1903 demonstrated the prime factorization of 2^67 - 1 in a famous "lecture without words". These days the factorization can be readily found using the following naive algorithm:

(def mersenne67 (dec (expt 2 67)))

(->> (iterate inc 2)
     (filter #(zero? (rem mersenne67 %)))
     (first))

However I've noticed that this Clojure code takes roughly twice as long as the equivalent Java or Kotlin code. (~40 vs ~20 seconds on my machine)
Here's the Java I compared it to:

  public static BigInteger mersenne() {
    BigInteger mersenne67 = 
      BigInteger.valueOf(2).pow(67).subtract(BigInteger.ONE);

    return Stream.iterate(BigInteger.valueOf(2), (x -> x.add(BigInteger.ONE)))
      .filter(x -> mersenne67.remainder(x).equals(BigInteger.ZERO))
      .findFirst()
      .get();
  }

Rewriting the Clojure code on a lower level made no difference:

(def mersenne67 (-> (BigInteger/valueOf 2)
                (.pow (BigInteger/valueOf 67))
                (.subtract BigInteger/ONE)))

(->> (iterate #(.add ^BigInteger % BigInteger/ONE) (BigInteger/valueOf 2))
     (filter #(= BigInteger/ZERO (.remainder ^BigInteger mersenne67 %)))
     (first))

After profiling the code with the VisualVM the primary suspect seems to be clojure.lang.Iterate.first() which accounts almost exactly for the difference in how long these functions run. Java's equivalent java.util.stream.ReferencePipeline.findFirst() runs for only a fraction as long (~22 vs ~2 seconds). This leads to my question: How does Java (and Kotlin) get away with spending so much less time on this task?

Sebastian Oberhoff
  • 1,271
  • 1
  • 10
  • 16

2 Answers2

2

Your problem is that you're inefficiently iterating through your iterate. This is why you're seeing first up top at your profiling. This is, of course a result of all the core functions of clojure working with lots and lots of different data structures.

The best way to avoid this is to use reduce, which gives the object itself the task to call the function in a loop.

So this is about 2x as fast:

(reduce
      (fn [_ x]
        (when (identical? BigInteger/ZERO (.remainder ^BigInteger mersenne67 x))
          (reduced x)))
      nil
      (iterate #(.add ^BigInteger % BigInteger/ONE) (BigInteger/valueOf 2)))
ClojureMostly
  • 4,652
  • 2
  • 22
  • 24
1

I apologize in advance for grave-digging, but I'm a bit concerned about ClojureMostly's answer. It surely solves the problem in timely manner but it looks like a dirty hack to me: passing anonymous reducing function, which ignores current result (_) and finishes as soon as first factor is found (reduced).

How about using transducers and transduce function:

(defn first-factor-tr 
  [^BigInteger n] 
  (transduce
    (comp (filter #(identical? BigInteger/ZERO (.remainder ^BigInteger n %))) (take 1))
    conj nil
    (iterate #(.add ^BigInteger % BigInteger/ONE) (BigInteger/valueOf 2))))

Filter out all values in collection with remainder of zero (filter ...) and take first one (take ...).

Execution time of this solution is on par with the one with reduce:

(defn first-factor-reduce 
  [^BigInteger n] 
  (reduce
    (fn [_ x]
      (when (identical? BigInteger/ZERO (.remainder ^BigInteger n x))
            (reduced x)))
    nil
    (iterate #(.add ^BigInteger % BigInteger/ONE) (BigInteger/valueOf 2))))

=> (time (first-factor-tr mersenne67))
"Elapsed time: 20896.594178 msecs"
(193707721)
=> (time (first-factor-reduce mersenne67))
"Elapsed time: 20424.323894 msecs"
193707721
Bojan Horvat
  • 223
  • 1
  • 7