3

I'm learning Racket (with the HtDP course) and it's my first shot at a program in a functional language.

I've tried to design a function that finds all primes under a certain input n using (what I think is) a functional approach to the problem, but the program can get really slow (86 seconds for 100.000, while my Python, C and C++ quickly-written solutions take just a couple of seconds).

The following is the code:

;; Natural Natural -> Boolean
;; Helper function to avoid writing the handful (= 0 (modulo na nb))

(define (divisible na nb) (= 0 (modulo na nb)))


;; Natural ListOfNatural -> Boolean
;; n is the number to check, lop is ALL the prime numbers less than n

(define (is-prime? n lop)
  (cond [(empty? lop) true]
        [(divisible n (first lop)) false]
        [ else (is-prime? n (rest lop))]))


;; Natural -> ListOfNatural

(define (find-primes n)
  (if (= n 2)
      (list 2)
      (local [(define LOP (find-primes (sub1 n)))]
        (if (is-prime? n LOP)
            (append LOP (list n))
                  LOP))))

(time (find-primes 100000))

I'm using the divisible function instead of just plowing the rest in because I really like to have separated functions when they could be of use in another part of the program. I also should probably define is-prime? inside of find-primes, since no one will ever call is-prime? on a number while also giving all the prime numbers less than that number.

Any pointers on how to improve this?

Leodip
  • 65
  • 1
  • 6
  • Nothing wrong with your Racket code. Some minor improvements will improve the running time dramatically but these are mathematical improvements not Racket improvements. – law-of-fives Feb 06 '18 at 14:33
  • 1
    Sussman/Abelson design a prime sieve in [*Structure and Interpretation of Computer Programs*](http://sarabander.github.io/sicp/), *Section 3.5.2* – examples are written in Scheme, which will be easy for you to follow. – Mulan Feb 06 '18 at 17:46
  • I ran you code, and it only took ~7.8 seconds on my laptop. Oscar's code took 2.3 seconds. Although, I tested `(time (length (find-primes 100000)))`. 2s [on tio](https://tio.run/##bVE9T8MwEN3zK05icSQqtawVVDAxRJS96uDEl8aqc45iuwj@fDi7pThVPThK3se9d3FNhz0u@P6ZpvUaPqQPozTX5@IF3qw1KKlg9B3NgCO0gRqvLYG3IE9WK/gatdd0AN8hdJJUGwyIZ1iC6K0KxgJJoLosi0IobDUhCKVP2una4AW7y2dBkaWqtPPb9n42Au3SfAp9zSE5G7dqjo9g7BCx16pK@DDq/o/lwKCLMklAWTjtFom2YVuWlwWAaCwp2AkKxmzSR3jwewbOZ5cXYrIcE4dJbUYCNO7GXjTqQt3f1uWCs8ZZQL7V2cMBpXi6jRskeCov44Rhbf6KHsSu2n7O1cKFesUePP6aM5nlIVn1jyaGHAbkfSS7NIgdZowoST9Q@LhvXo8bjPyOOejgu3mI1TKeyJ@mXw) – Will Ness Feb 08 '18 at 19:01
  • and if we add the `sqrt` check, it becomes 0.77s. "goo.gl/p81xd2" --- BTW `length` doesn't matter, testing without it `(time (find-primes 100000))` in DrRacket takes exactly the same time. – Will Ness Feb 09 '18 at 18:30
  • The performance is totally about the algorithm. [Here is my attempt with JS which resolves primes up to 100K in just 10ms](https://stackoverflow.com/a/41434702/4543207). So if you may 1:1 implement it to Racket you might probably get a much better performance. – Redu Apr 23 '18 at 22:32

2 Answers2

1

Here are some ideas for improving the performance, the procedure now returns in under two seconds for n = 100000.

(define (is-prime? n lop)
  (define sqrtn (sqrt n))
  (if (not (or (= (modulo n 6) 1) (= (modulo n 6) 5)))
      false
      (let loop ([lop lop])
        (cond [(or (empty? lop) (< sqrtn (first lop))) true]
              [(zero? (modulo n (first lop))) false]
              [else (loop (rest lop))]))))

(define (find-primes n)
  (cond [(<= n 1) '()]
        [(=  n 2) '(2)]
        [(=  n 3) '(2 3)]
        [else
         (let loop ([lop '(2 3)] [i 5])
           (cond [(> i n) lop]
                 [(is-prime? i lop) (loop (append lop (list i)) (+ i 2))]
                 [else (loop lop (+ i 2))]))]))

Some of the optimizations are language-related, others are algorithmic:

  • The recursion was converted to be in tail position. In this way, the recursive call is the last thing we do at each step, with nothing else to do after it - and the compiler can optimize it to be as efficient as a loop in other programming languages.
  • The loop in find-primes was modified for only iterating over odd numbers. Note that we go from 3 to n instead of going from n to 2.
  • divisible was inlined and (sqrt n) is calculated only once.
  • is-prime? only checks up until sqrt(n), it makes no sense to look for primes after that. This is the most important optimization, instead of being O(n) the algorithm is now O(sqrt(n)).
  • Following @law-of-fives's advice, is-prime? now skips the check when n is not congruent to 1 or 5 modulo 6.

Also, normally I'd recommend to build the list using cons instead of append, but in this case we need the prime numbers list to be constructed in ascending order for the most important optimization in is-prime? to work.

Óscar López
  • 232,561
  • 37
  • 312
  • 386
  • In practice I find that the next most-trivial optimization for a trial-division routine is to only check `is-prime?` when `n` is congruent to `1` or `5` mod 6. This requires the initial prime list be populated with `'(2 3)`. It's a definite speedup over just skipping evens. – law-of-fives Feb 06 '18 at 19:37
  • @law-of-fives that's a nice optimization! I added it to my solution, thanks! – Óscar López Feb 06 '18 at 20:34
  • Thanks for the answer! What I was looking for the most were language-related optimization: I'm aware of the many simplifications for prime numbers (I'm actually a Project Euler addict), but I intentionally left out those to have the simplest code and compare it with a similar, imperative, non-recursive, for-loop-based python, C and C++ code to see how it fared. Tail recursion seems like it might actually improve the solution quite a bit. Also, a question, does inlining divisible actually have any impact on the execution? – Leodip Feb 06 '18 at 22:45
  • @Leodip Alright! from the language point of view, the best optimizations for the problem are: writing the code for [tail recursion](https://docs.racket-lang.org/guide/Lists__Iteration__and_Recursion.html#%28part._tail-recursion%29) and `cons`ing instead of `append`ing when creating a new list - this is to avoid [Schlemiel the painter's](https://en.wikipedia.org/wiki/Joel_Spolsky#Schlemiel_the_Painter%27s_algorithm) algorithm, but it'll make more difficult writing an `O(sqrt(n))` solution. Inlining has a negligible effect. – Óscar López Feb 06 '18 at 23:16
  • speaking [of painters](https://stackoverflow.com/questions/24202687/painter-puzzle-estimation) and [complexities](http://en.wikipedia.org/wiki/Analysis_of_algorithms#Empirical_orders_of_growth), using `append` destroys the algorithmic gains of using `sqrt`. your code is just as quadratic as OP's; runs at constant ~4x factor faster (give or take). `set-mcdr!` seems unavoidable. or arrays. – Will Ness Feb 08 '18 at 09:12
  • @WillNess I'm not sure about that analysis, we use `append` with much less frequency than doing a search. Using `cons` and removing the `O(sqrt(n))` search yields an algorithm that's 10x slower than my current implementation. Sure, using a different data structure (like a vector) would be the way to go, but in its current state I'd argue that the cost of `append` is amortized and the quadratic performance is avoided. – Óscar López Feb 08 '18 at 19:42
  • what I said were empirical observations. you use `append` for *each* prime found, that's what counts. with `set-mcdr!` it runs at ~ n^1.25, takes 1M:1.2s, 2M:2.8s on same (mine) installation, inside DrRacket. – Will Ness Feb 08 '18 at 20:00
  • OP's code uses ~ pi(p) *search* for each prime p, yours ~ pi(sqrt(p)), but that is dwarfed by ~ pi(p) *append* for each prime p in both implementations. – Will Ness Feb 08 '18 at 20:06
1

Here's Óscar López's code, tweaked to build the list in the top-down manner:

(define (is-prime? n lop)
  (define sqrtn (sqrt n))
  (let loop ([lop lop])
        (cond [(or (empty? lop) (< sqrtn (mcar lop))) true]
              [(zero? (modulo n (mcar lop))) false]
              [else (loop (mcdr lop))])))

(define (find-primes n)
  (let* ([a (mcons 3 '())]
         [b (mcons 2 a)])
    (let loop ([p a] [i 5] [d 2]     ; d = diff +2 +4 +2 ...
                           [c 2])    ; c = count of primes found
           (cond [(> i n) c]
                 [(is-prime? i (mcdr a))
                       (set-mcdr! p (mcons i '()))
                       (loop (mcdr p) (+ i d) (- 6 d) (+ c 1))]
                 [else (loop p        (+ i d) (- 6 d)    c   )]))))

Runs at about ~n1.25..1.32, empirically; compared to the original's ~n1.8..1.9, in the measured range, inside DrRacket (append is the culprit of that bad behaviour). The "under two seconds" for 100K turns into under 0.05 seconds; two seconds gets you well above 1M (one million):

; (time (length (find-primes 100000)))  ; with cons        times in milliseconds

; 10K 156 ; 20K 437  ; 40K 1607 ; 80K 5241 ; 100K  7753   .... n^1.8-1.9-1.7    OP's
; 10K  62 ; 20K 109  ; 40K  421 ; 80K 1217 ; 100K  2293   .... n^1.8-1.9     Óscar's

; mcons:
(time (find-primes 2000000))

; 100K 47 ; 200K 172 ; 1M 1186 ; 2M 2839 ; 3M 4851 ; 4M 7036 .... n^1.25-1.32    this
; 9592      17984      78498     148933    216816    283146

It's still just a trial division though... :) The sieve of Eratosthenes will be much faster yet.


edit: As for set-cdr!, it is easy to emulate any lazy algorithm with it... Otherwise, we could use extendable arrays (lists of...), for the amortized O(1) snoc/append1 operation (that's lots and lots of coding); or maintain the list of primes split in two (three, actually; see the code below), building the second portion in reverse with cons, and appending it in reverse to the first portion only every so often (specifically, judging the need by the next prime's square):

; times:               ; 2M 1934 ; 3M 3260 ; 4M 4665 ; 6M 8081 .... n^1.30

;; find primes up to and including n, n > 2
(define (find-primes n)
  (let loop ( [k 5] [q 9]              ; next candidate; square of (car LOP2)
                    [LOP1 (list 2)]    ; primes to test by
                    [LOP2 (list 3)]    ; more primes
                    [LOP3 (list  )] )  ; even more primes, in reverse
    (cond [ (> k n)
            (append LOP1 LOP2 (reverse LOP3)) ]
          [ (= k q) 
            (if (null? (cdr LOP2))
              (loop    k     q  LOP1 (append LOP2 (reverse LOP3)) (list)) 
              (loop (+ k 2)   
                      (* (cadr LOP2) (cadr LOP2))          ; next prime's square
                        (append LOP1 (list (car LOP2)))
                                     (cdr LOP2)      LOP3 )) ]
          [ (is-prime? k (cdr LOP1))
              (loop (+ k 2)  q  LOP1  LOP2   (cons k LOP3))  ]
          [ else
              (loop (+ k 2)  q  LOP1  LOP2           LOP3 )  ])))

 ;; n is the number to check, lop is list of prime numbers to check it by
(define (is-prime? n lop)
  (cond [ (null? lop) #t ]
        [ (divisible n (car lop)) #f ]   
        [ else (is-prime? n (cdr lop)) ]))

edit2: The easiest and simplest fix though, closest to your code, was to decouple the primes calculations of the resulting list, and of the list to check divisibility by. In your

      (local [(define LOP (find-primes (sub1 n)))]
        (if (is-prime? n LOP)

LOP is used as the list of primes to check by, and it is reused as part of the result list in

            (append LOP (list n))
                  LOP))))

immediately afterwards. Breaking this entanglement enables us to stop the generation of testing primes list at the sqrt of the upper limit, and thus it gives us:

;times:                    ; 1M-1076  2M-2621  3M-4664   4M-6693
                           ;           n^1.28   ^1.33     n^1.32
(define (find-primes n)
  (cond
    ((<= n 4)  (list 2 3))
    (else
     (let* ([LOP (find-primes (inexact->exact (floor (sqrt n))))] 
            [lp  (last LOP)])
       (local ([define (primes k ps)
                  (if (<= k lp)  
                      (append LOP ps)
                      (primes (- k 2) (if (is-prime? k LOP)
                                          (cons k ps) 
                                          ps)))])
         (primes (if (> (modulo n 2) 0) n (- n 1)) '()))))))

It too uses the same is-prime? code as in the question, unaltered, as does the second variant above.

It is slower than the 2nd variant. The algorithmic reason for this is clear — it tests all numbers from sqrt(n) to n by the same list of primes, all smaller or equal to the sqrt(n) — but in testing a given prime p < n it is enough to use only those primes that are not greater than sqrt(p), not sqrt(n). But it is the closest to your original code.


For comparison, in Haskell-like syntax, under strict evaluation,

isPrime n lop = null [() | p <- lop, rem n p == 0] 
-- OP:
findprimes 2 = [2] 
findprimes n = lop ++ [n | isPrime n lop] 
                               where lop = findprimes (n-1)
             = lop ++ [n | n <- [q+1..n], isPrime n lop] 
                               where lop = findprimes q ; q = (n-1)
-- 3rd:
findprimes n | n < 5 = [2,3] 
findprimes n = lop ++ [n | n <- [q+1..n], isPrime n lop] 
                               where lop = findprimes q ; 
                                       q = floor $ sqrt $ fromIntegral n
-- 2nd:
findprimes n =                 g  5      9   [2]      [3]       []
    where
    g                             k      q    a        b        c 
     | k > n                 =                a ++ b ++ reverse c
     | k == q, [h]      <- b = g  k      q    a  (h:reverse c)  []
     | k == q, (h:p:ps) <- b = g (k+2) (p*p) (a++[h]) (p:ps)    c
     | isPrime k a           = g (k+2)   q    a        b     (k:c)
     | otherwise             = g (k+2)   q    a        b        c

The b and c together (which is to say, LOP2 and LOP3 in the Scheme code) actually constitute a pure functional queue a-la Okasaki, from which sequential primes are taken and appended at the end of the maintained primes prefix a (i.e. LOP1) now and again, on each consecutive prime's square being passed, for a to be used in the primality testing by isPrime.

Because of the rarity of this appending, its computational inefficiency has no impact on the time complexity of the code overall.

Will Ness
  • 70,110
  • 9
  • 98
  • 181