Haskell slower than Python in naïve integer factorization?

Question

I'm taking a math course where we had to do some integer factorizations as an intermediate step to a problem. I decided to write a Python program to do this for me (we weren't being tested on our ability to factor, so this is completely above board). The program is as follows:

#!/usr/bin/env python3

import math
import sys

# Return a list representing the prime factorization of n. The factorization is
#   found using trial division (highly inefficient).
def factorize(n):

    def factorize_helper(n, min_poss_factor):
        if n <= 1:
            return []
        prime_factors = []
        smallest_prime_factor = -1
        for i in range(min_poss_factor, math.ceil(math.sqrt(n)) + 1):
            if n % i == 0:
                smallest_prime_factor = i
                break
        if smallest_prime_factor != -1:
            return [smallest_prime_factor] \
                   + factorize_helper(n // smallest_prime_factor,
                                      smallest_prime_factor)
        else:
            return [n]

    if n < 0:
        print("Usage: " + sys.argv[0] + " n   # where n >= 0")
        return []
    elif n == 0 or n == 1:
        return [n]
    else:
        return factorize_helper(n, 2)

if __name__ == "__main__":
    factorization = factorize(int(sys.argv[1]))
    if len(factorization) > 0:
        print(factorization)

I've been teaching myself some Haskell as well, so I decided to try rewriting the program in Haskell. That program is as follows:

import System.Environment

-- Return a list containing all factors of n at least x.
factorize' :: (Integral a) => a -> a -> [a]
factorize' n x = smallestFactor
                 : (if smallestFactor == n
                    then []
                    else factorize' (n `quot` smallestFactor) smallestFactor)
    where
        smallestFactor = getSmallestFactor n x
        getSmallestFactor :: (Integral a) => a -> a -> a
        getSmallestFactor n x
            | n `rem` x == 0                          = x
            | x > (ceiling . sqrt . fromIntegral $ n) = n
            | otherwise                               = getSmallestFactor n (x+1)

-- Return a list representing the prime factorization of n.
factorize :: (Integral a) => a -> [a]
factorize n = factorize' n 2

main = do
    argv <- getArgs
    let n = read (argv !! 0) :: Int
    let factorization = factorize n
    putStrLn $ show (factorization)
    return ()

(note: this requires a 64-bit environment. On 32-bit, import Data.Int and use Int64 as the type annotation on read (argv !! 0))

After I'd written this, I decided to compare the performance of the two, recognizing that there are better algorithms, but that the two programs use essentially the same algorithm. I do, for example, the following:

$ ghc --make -O2 factorize.hs
$ /usr/bin/time -f "%Uu %Ss %E" ./factorize 89273487253497
[3,723721,41117819]
0.18u 0.00s 0:00.23

Then, timing the Python program:

$ /usr/bin/time -f "%Uu %Ss %E" ./factorize.py 89273487253497
[3, 723721, 41117819]
0.09u 0.00s 0:00.09

Naturally, the times vary slightly each time I run one of the programs, but they are always in this range, with the Python program several times quicker than the compiled Haskell program. It seems to me that the Haskell version should be able to run quicker, and I'm hoping you can give me an idea of how to improve it so that this is the case.

I've seen some tips on optimizing Haskell programs, as in answers to this question, but can't seem to get my program running any quicker. Are loops this much quicker than recursion? Is Haskell's I/O particularly slow? Have I made a mistake in actually implementing the algorithm? Ideally, I'd like to have an optimized version of the Haskell that is still relatively easy to read

This has nothing to do with performance, but you should not use `read` to process user input. Import `Text.Read` and use `readMaybe`. This will allow you to handle errors properly. — dfeuer, Nov 16 '16 at 03:15
FWIW, that Python code would be a lot nicer if expressed imperatively. [Aka. this code.](https://gist.github.com/Veedrac/e937a728b227c9dd576a8c26e4058536) With PyPy it's 3x the speed of Cactus' if Haskell uses big integers (like Python), and a reasonable 85% if Haskell gets to use fixed width ints. — Veedrac, Nov 16 '16 at 09:49
Note that test cases like this with < 1 s total program runtime are hardly useful to compare real-world performance. Haskell is reasonably quick to start, but it still needs some “runtime warm-up”. — leftaroundabout, Nov 16 '16 at 11:02
@dfeuer -- I wasn't too worried about error-handling here, but that's good to keep in mind. Thanks! @Veedrac -- Your Python program is a lot cleaner than mine. I haven't used `yield` before, but it seems really handy. @leftaroundabout -- I know such small tests aren't generally reliable, but I haven't heard of the idea of this "warm-up" time before. That's really interesting; thank you. — jdw1996, Nov 16 '16 at 20:08
@Veedrac, I think there's some room for further optimization of the Haskell code; see my answer. — dfeuer, Nov 16 '16 at 20:13
@jdw1996, yes, the Haskell runtime system takes a non-trivial amount of time to start itself up before beginning to run user code. It certainly has to set up some data structures to manage garbage collection, I imagine it likely loads a number of libraries, I believe it allocates Haskell handles for standard input/output/error, and it may have other "administrative" work as well. — dfeuer, Nov 16 '16 at 20:18

score 13 · Accepted Answer · answered Nov 16 '16 at 02:21

If you compute limit = ceiling . sqrt . fromIntegral $ n only once, instead of once per iteration, then I see the Haskell version being faster:

limit = ceiling . sqrt . fromIntegral $ n
smallestFactor = getSmallestFactor x

getSmallestFactor x
    | n `rem` x == 0 = x
    | x > limit      = n
    | otherwise      = getSmallestFactor (x+1)

using this version, I see:

$ time ./factorizePy.py 89273487253497
[3, 723721, 41117819]

real    0m0.236s
user    0m0.171s
sys     0m0.062s

$ time ./factorizeHs  89273487253497
[3,723721,41117819]

real    0m0.190s
user    0m0.000s
sys     0m0.031s

dfeuer · Answer 2 · 2016-11-16T23:23:45.320

Beyond the critical point Cactus made, there is also some room here for some refactoring and strictness annotations to avoid creating unnecessary thunks. Note in particular that factorize is lazy:

factorize' undefined undefined = undefined : undefined

This isn't really necessary, and forces GHC to allocate several thunks. Extra laziness elsewhere does the same. I expect you'll get somewhat better performance like this:

{-# LANGUAGE BangPatterns #-}

factorize' :: Integral a => a -> a -> [a]
factorize' n x
  | smallestFactor == n = [smallestFactor]
  | otherwise = smallestFactor : factorize' (n `quot` smallestFactor) smallestFactor
  where
    smallestFactor = getSmallestFactor n (ceiling . sqrt . fromIntegral $ n) x
    getSmallestFactor n !limit x
       | n `rem` x == 0 = x
       | x > limit = n
       | otherwise = getSmallestFactor n limit (x+1)

-- Return a list representing the prime factorization of n.
factorize :: Integral a => a -> [a]
factorize n = factorize' n 2

I made getSmallestFactor take both n and the limit as arguments. This prevents getSmallestFactor from being allocated as a closure on the heap. I'm not sure whether that's worth the extra argument shuffling; you can try that both ways.

Haskell slower than Python in naïve integer factorization?

2 Answers2