Python faster than compiled Haskell?

Question

I have a simple script written in both Python and Haskell. It reads a file with 1,000,000 newline separated integers, parses that file into a list of integers, quick sorts it and then writes it to a different file sorted. This file has the same format as the unsorted one. Simple.

Here is Haskell:

quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) = (quicksort lesser) ++ [p] ++ (quicksort greater)
    where
        lesser  = filter (< p) xs
        greater = filter (>= p) xs

main = do
    file <- readFile "data"
    let un = lines file
    let f = map (\x -> read x::Int ) un
    let done = quicksort f
    writeFile "sorted" (unlines (map show done))

And here is Python:

def qs(ar):
    if len(ar) == 0:
        return ar

    p = ar[0]
    return qs([i for i in ar if i < p]) + [p] + qs([i for i in ar if i > p])


def read_file(fn):
    f = open(fn)
    data = f.read()
    f.close()
    return data

def write_file(fn, data):
    f = open('sorted', 'w')
    f.write(data)
    f.close()


def main():
    data = read_file('data')

    lines = data.split('\n')
    lines = [int(l) for l in lines]

    done = qs(lines)
    done = [str(l) for l in done]

    write_file('sorted', "\n".join(done))

if __name__ == '__main__':
    main()

Very simple. Now I compile the Haskell code with

$ ghc -O2 --make quick.hs

And I time those two with:

$ time ./quick
$ time python qs.py

Results:

Haskell:

real    0m10.820s
user    0m10.656s
sys 0m0.154s

Python:

real    0m9.888s
user    0m9.669s
sys 0m0.203s

How can Python possibly be faster than native code Haskell?

Thanks

EDIT:

Python version: 2.7.1
GHC version: 7.0.4
Mac OSX, 10.7.3
2.4GHz Intel Core i5

List generated by

from random import shuffle
a = [str(a) for a in xrange(0, 1000*1000)]
shuffle(a)
s = "\n".join(a)
f = open('data', 'w')
f.write(s)
f.close()

So all numbers are unique.

Bug in python implementation? You're building the sublists with `i > p` and `i < p`. What about when `i = p`? — gnud, Apr 27 '12 at 20:59
Well, I guess most of the Python run time is spent in the C code that concatenates Python lists (dynamic arrays, in case someone's not aware). `cProfile` should tell you whether that's true. — , Apr 27 '12 at 20:59
Have you confirmed that the output of the programs is *exactly* the same? — David Robinson, Apr 27 '12 at 21:15
Was the input file cached before you ran the Haskell program? It looks as though it was before you ran the Python program. — dave4420, Apr 27 '12 at 21:26
How is 9.9 faster than 10.8? A simple HD read error could cause a difference ten times that. — aib, Apr 27 '12 at 21:30
Which versions of GHC and Python are you using? I get an exception when I try to run your Python code (Python 2.6.5 --- I'm not a Pythonista). — dave4420, Apr 27 '12 at 21:36
Note that you're mostly timing I/O and converting between integers and strings. Replacing the call to `quicksort` with `id` only decreases the run time by ~30% on my machine. — hammar, Apr 27 '12 at 22:16
@dave: Try adding `from __future__ import generators` to the top of the file. — Niklas B., Apr 27 '12 at 22:38
@NiklasB. I still get the error ``ValueError: invalid literal for int() with base 10: ''`` for the second `lines =` line in `main`. — dave4420, Apr 28 '12 at 00:16

Thomas M. DuBuisson · Answer 1 · 2014-07-22T19:37:06.203

The Original Haskell Code

There are two issues with the Haskell version:

You're using string IO, which builds linked lists of characters
You're using a non-quicksort that looks like quicksort.

This program takes 18.7 seconds to run on my Intel Core2 2.5 GHz laptop. (GHC 7.4 using -O2)

Daniel's ByteString Version

This is much improved, but notice it still uses the inefficient built-in merge sort.

His version takes 8.1 seconds (and doesn't handle negative numbers, but that's more of a non-issue for this exploration).

Note

From here on this answer uses the following packages: Vector, attoparsec, text and vector-algorithms. Also notice that kindall's version using timsort takes 2.8 seconds on my machine (edit: and 2 seconds using pypy).

A Text Version

I ripped off Daniel's version, translated it to Text (so it handles various encodings) and added better sorting using a mutable Vector in an ST monad:

import Data.Attoparsec.Text.Lazy
import qualified Data.Text.Lazy as T
import qualified Data.Text.Lazy.IO as TIO
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Algorithms.Intro as I
import Control.Applicative
import Control.Monad.ST
import System.Environment (getArgs)

parser = many (decimal <* char '\n')

main = do
    numbers <- TIO.readFile =<< fmap head getArgs
    case parse parser numbers of
        Done t r | T.null t -> writeFile "sorted" . unlines
                                                  . map show . vsort $ r
        x -> error $ Prelude.take 40 (show x)

vsort :: [Int] -> [Int]
vsort l = runST $ do
        let v = V.fromList l
        m <- V.unsafeThaw v
        I.sort m
        v' <- V.unsafeFreeze m
        return (V.toList v')

This runs in 4 seconds (and also doesn't handle negatives)

Return to the Bytestring

So now we know we can make a more general program that's faster, what about making the ASCii -only version fast? No problem!

import qualified Data.ByteString.Lazy.Char8 as BS
import Data.Attoparsec.ByteString.Lazy (parse,  Result(..))
import Data.Attoparsec.ByteString.Char8 (decimal, char)
import Control.Applicative ((<*), many)
import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Algorithms.Intro as I
import Control.Monad.ST


parser = many (decimal <* char '\n')

main = do
    numbers <- BS.readFile "rands"
    case parse parser numbers of
        Done t r | BS.null t -> writeFile "sorted" . unlines
                                                   . map show . vsort $ r

vsort :: [Int] -> [Int]
vsort l = runST $ do
        let v = V.fromList l
        m <- V.unsafeThaw v
        I.sort m
        v' <- V.unsafeFreeze m
        return (V.toList v')

This runs in 2.3 seconds.

Producing a Test File

Just in case anyone's curious, my test file was produced by:

import Control.Monad.CryptoRandom
import Crypto.Random
main = do
  g <- newGenIO :: IO SystemRandom
  let rs = Prelude.take (2^20) (map abs (crandoms g) :: [Int])
  writeFile "rands" (unlines $ map show rs)

If you're wondering why vsort isn't packaged in some easier form on Hackage... so am I.

[GHC doesn't appear to use insertion sort by default.](http://hackage.haskell.org/packages/archive/base/latest/doc/html/src/Data-List.html#sort) (Assuming the default is to not use the report prelude.) — huon, Apr 28 '12 at 04:06
Making a final version that's eight times faster than the original... not bad for a day's work! — Daniel Wagner, Apr 28 '12 at 04:23
@dbaupp It doesn't use prelude by default. Still, mergesort leaves a lot of room for improvement. — Thomas M. DuBuisson, Apr 28 '12 at 04:28
"You're using a non-quicksort that looks like quicksort." Could you expand on this? — Gilles, May 03 '12 at 17:17
Gilles: http://stackoverflow.com/questions/7717691/why-is-the-minimalist-example-haskell-quicksort-not-a-true-quicksort — Thomas M. DuBuisson, May 03 '12 at 19:44
@Viclib So? "His own defined sorting function" is not quicksort. — Thomas M. DuBuisson, May 26 '13 at 16:07
The last bullet point here is 100% wrong. The code in the Python program that performs the sort is all bytecode-interpreted in CPython. — Jason Orendorff, Jul 21 '14 at 19:44
@JasonOrendorff and as Viclib was trying to make me see, the Python sort was hand-written. Didn't realize Viclib was talking about that bullet and the Python instead of the Haskell code. I'll edit! — Thomas M. DuBuisson, Jul 22 '14 at 19:36

Daniel Wagner · Accepted Answer · 2012-04-27T22:10:42.117

41

In short, don't use read. Replace read with a function like this:

import Numeric

fastRead :: String -> Int
fastRead s = case readDec s of [(n, "")] -> n

I get a pretty fair speedup:

~/programming% time ./test.slow
./test.slow  9.82s user 0.06s system 99% cpu 9.901 total
~/programming% time ./test.fast
./test.fast  6.99s user 0.05s system 99% cpu 7.064 total
~/programming% time ./test.bytestring
./test.bytestring  4.94s user 0.06s system 99% cpu 5.026 total

Just for fun, the above results include a version that uses ByteString (and hence fails the "ready for the 21st century" test by totally ignoring the problem of file encodings) for ULTIMATE BARE-METAL SPEED. It also has a few other differences; for example, it ships out to the standard library's sort function. The full code is below.

import qualified Data.ByteString as BS
import Data.Attoparsec.ByteString.Char8
import Control.Applicative
import Data.List

parser = many (decimal <* char '\n')

reallyParse p bs = case parse p bs of
    Partial f -> f BS.empty
    v -> v

main = do
    numbers <- BS.readFile "data"
    case reallyParse parser numbers of
        Done t r | BS.null t -> writeFile "sorted" . unlines . map show . sort $ r

edited Apr 27 '12 at 22:10

answered Apr 27 '12 at 21:37

Daniel Wagner

145,880
9
220
380

1

Anything special I need to do to compile this? I'm getting `Could not find module `Data.Attoparsec.ByteString.Char8'` – Honza Pokorny Apr 27 '12 at 22:35
1

@HonzaPokorny Yes, `cabal install attoparsec` from your command line first. – Daniel Wagner Apr 27 '12 at 22:36
1

Compiles fine but doesn't work for me. Complains about `Non-exhaustive patterns in case`. I'm not really a Haskeller. – Honza Pokorny Apr 27 '12 at 22:57
@HonzaPokorny Probably the file was not in exactly the format recognized by the parser -- perhaps you're on Windows, or there's no newline at the end of the file, or some other such thing. Without having the actual file in hand (and not, for example, the result of copying and pasting to an online pastebin), it's basically impossible to debug. – Daniel Wagner Apr 27 '12 at 23:02
3

Python 2, when used like in OP's script, gives a damn about encoding, too, so I guess this is fair :) – Niklas B. Apr 28 '12 at 00:22
The error you get might be due to negative numbers - the parser Daniel uses (and I ripped off for my answer) doesn't handle negatives. – Thomas M. DuBuisson Apr 28 '12 at 04:08
Why is `readDec` faster than `read`? Shouldn't GHC be able to internally use `readDec` since it has an expected type (`read x::Int`)? – Kendall Hopkins Apr 28 '12 at 07:19
@KendallHopkins `readDec` and `read` serve different purposes. For example, `read "(24)" :: Int` succeeds, but `readDec "(24)" :: [(Int, String)]` doesn't. – Daniel Wagner Apr 28 '12 at 08:10
So what's the general best practice here? Use `readDec`? Use `read` and don't worry about it unless it shows itself to be a bottleneck? Roll your own using `attoparsec`? – ben w Apr 28 '12 at 23:31
@benw Best practice is to use `Read` for one-offs, and write a real parser for anything serious. What library you use for parsing depends on the format. `readDec` still probably falls in the "only for one-offs" category. – Daniel Wagner Apr 29 '12 at 01:44

kindall · Answer 3 · 2012-04-27T23:34:20.323

More a Pythonista than a Haskellite, but I'll take a stab:

There's a fair bit of overhead in your measured runtime just reading and writing the files, which is probably pretty similar between the two programs. Also, be careful that you've warmed up the cache for both programs.
Most of your time is spent making copies of lists and fragments of lists. Python list operations are heavily optimized, being one of the most-frequently used parts of the language, and list comprehensions are usually pretty performant too, spending much of their time in C-land inside the Python interpreter. There is not a lot of the stuff that is slowish in Python but wicked fast in static languages, such as attribute lookups on object instances.
Your Python implementation throws away numbers that are equal to the pivot, so by the end it may be sorting fewer items, giving it an obvious advantage. (If there are no duplicates in the data set you're sorting, this isn't an issue.) Fixing this bug probably requires making another copy of most of the list in each call to qs(), which would slow Python down a little more.
You don't mention what version of Python you're using. If you're using 2.x, you could probably get Haskell to beat Python just by switching to Python 3.x. :-)

I'm not too surprised the two languages are basically neck-and-neck here (a 10% difference is not noteworthy). Using C as a performance benchmark, Haskell loses some performance for its lazy functional nature, while Python loses some performance due to being an interpreted language. A decent match.

Since Daniel Wagner posted an optimized Haskell version using the built-in sort, here's a similarly optimized Python version using list.sort():

mylist = [int(x.strip()) for x in open("data")]
mylist.sort()
open("sorted", "w").write("\n".join(str(x) for x in mylist))

3.5 seconds on my machine, vs. about 9 for the original code. Pretty much still neck-and-neck with the optimized Haskell. Reason: it's spending most of its time in C-programmed libraries. Also, TimSort (the sort used in Python) is a beast.

Good observations, just one thing regarding #4: He does name compiler and flags: `ghc -O2 --make quick.hs` — , Apr 27 '12 at 21:34
Well, the Python and Haskell are close to the same speed now, so if you switched to a slower version of Python (e.g. 3.x), the Python version might fall behind the Haskell one. — kindall, Jan 07 '13 at 14:27
well yeah, that was exactly what I was wondering about. Why is python 3 slower than python 2? Do you have any references/material I could have a look at? — Stephan Dollberg, Jan 08 '13 at 07:34
There are two major performance bottlenecks in Python 3: all integers are long integers, which are slower to do math with (pystone on 3.0 is 20% slower than on 2.6), and all strings are Unicode, which is slower to work with, and requires encoding/decoding of all I/O (Web apps benchmark 15% fewer requests per second). On some operations Python 3 seems to actually be faster, though, and I believe there's recently been some work done to narrow the gap. (I got the above numbers from http://mikewatkins.ca/2008/12/06/python-3-performance-a-red-herring/ but Google "Python 3 performance" etc.) — kindall, Jan 08 '13 at 15:51

applicative · Answer 4 · 2012-05-03T20:44:01.137

This is after the fact, but I think most of the trouble is in the Haskell writing. The following module is pretty primitive -- one should use builders probably and certainly avoid the ridiculous roundtrip via String for showing -- but it is simple and did distinctly better than pypy with kindall's improved python and better than the 2 and 4 sec Haskell modules elsewhere on this page (it surprised me how much they were using lists, so I made a couple more turns of the crank.)

$ time aa.hs        real    0m0.709s
$ time pypy aa.py   real    0m1.818s
$ time python aa.py real    0m3.103s

I'm using the sort recommended for unboxed vectors from vector-algorithms. The use of Data.Vector.Unboxed in some form is clearly now the standard, naive way of doing this sort of thing -- it's the new Data.List (for Int, Double, etc.) Everything but the sort is irritating IO management, which could I think still be massively improved, on the write end in particular. The reading and sorting together take about 0.2 sec as you can see from asking it to print what's at a bunch of indexes instead of writing to file, so twice as much time is spent writing as in anything else. If the pypy is spending most of its time using timsort or whatever, then it looks like the sorting itself is surely massively better in Haskell, and just as simple -- if you can just get your hands on the darned vector...

I'm not sure why there aren't convenient functions around for reading and writing vectors of unboxed things from natural formats -- if there were, this would be three lines long and would avoid String and be much faster, but maybe I just haven't seen them.

import qualified Data.ByteString.Lazy.Char8 as BL
import qualified Data.ByteString.Char8 as B
import qualified Data.Vector.Unboxed.Mutable as M
import qualified Data.Vector.Unboxed as V
import Data.Vector.Algorithms.Radix 
import System.IO

main  = do  unsorted <- fmap toInts (BL.readFile "data")
            vec <- V.thaw unsorted
            sorted <- sort vec >> V.freeze vec
            withFile "sorted" WriteMode $ \handle ->
               V.mapM_ (writeLine handle) sorted

writeLine :: Handle -> Int -> IO ()
writeLine h int = B.hPut h $ B.pack (show int ++ "\n")

toInts :: BL.ByteString -> V.Vector Int
toInts bs = V.unfoldr oneInt (BL.cons ' ' bs) 

oneInt :: BL.ByteString -> Maybe (Int, BL.ByteString)
oneInt bs = if BL.null bs then Nothing else 
               let bstail = BL.tail bs
               in if BL.null bstail then Nothing else BL.readInt bstail

score 2 · Answer 5 · answered Apr 27 '12 at 22:11

To follow up @kindall interesting answer, those timings are dependent from both the python / Haskell implementation you use, the hardware configuration on which you run the tests, and the algorithm implementation you right in both languages.

Nevertheless we can try to get some good hints of the relative performances of one language implementation compared to another, or from one language to another language. With well known alogrithms like qsort, it's a good beginning.

To illustrate a python/python comparison, I just tested your script on CPython 2.7.3 and PyPy 1.8 on the same machine:

CPython: ~8s
PyPy: ~2.5s

This shows there can be room for improvements in the language implementation, maybe compiled Haskell is not performing at best the interpretation and compilation of your corresponding code. If you are searching for speed in Python, consider also to switch to pypy if needed and if your covering code permits you to do so.

score 2 · Answer 6 · edited Feb 06 '14 at 20:02

i noticed some problem everybody else didn't notice for some reason; both your haskell and python code have this. (please tell me if it's fixed in the auto-optimizations, I know nothing about optimizations). for this I will demonstrate in haskell. in your code you define the lesser and greater lists like this:

where lesser = filter (<p) xs
      greater = filter (>=p) xs

this is bad, because you compare with p each element in xs twice, once for getting in the lesser list, and again for getting in the greater list. this (theoretically; I havn't checked timing) makes your sort use twice as much comparisons; this is a disaster. instead, you should make a function which splits a list into two lists using a predicate, in such a way that

split f xs

is equivalent to

(filter f xs, filter (not.f) xs)

using this kind of function you will only need to compare each element in the list once to know in which side of the tuple to put it.
okay, lets do it:

where
    split :: (a -> Bool) -> [a] -> ([a], [a])
    split _ [] = ([],[])
    split f (x:xs)
        |f x       = let (a,b) = split f xs in (x:a,b)
        |otherwise = let (a,b) = split f xs in (a,x:b)

now lets replace the lesser/greater generator with

let (lesser, greater) = split (p>) xs in (insert function here)

full code:

quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) =
    let (lesser, greater) = splitf (p>) xs
    in (quicksort lesser) ++ [p] ++ (quicksort greater)
    where
        splitf :: (a -> Bool) -> [a] -> ([a], [a])
        splitf _ [] = ([],[])
        splitf f (x:xs)
            |f x       = let (a,b) = splitf f xs in (x:a,b)
            |otherwise = let (a,b) = splitf f xs in (a,x:b)

for some reason I can't right the getter/lesser part in the where clauses so I had to right it in let clauses. also, if it is not tail-recursive let me know and fix it for me (I don't know yet how tail-recorsive works fully)

now you should do the same for the python code. I don't know python so I can't do it for you.

EDIT: there actually happens to already be such function in Data.List called partition. note this proves the need for this kind of function because otherwise it wouldn't be defined. this shrinks the code to:

quicksort :: Ord a => [a] -> [a]
quicksort []     = []
quicksort (p:xs) =
    let (lesser, greater) = partition (p>) xs
    in (quicksort lesser) ++ [p] ++ (quicksort greater)

score 1 · Answer 7 · edited May 23 '17 at 12:17

1

Python is really optimized for this sort of thing. I suspect that Haskell isn't. Here's a similar question that provides some very good answers.

edited May 23 '17 at 12:17

Community

1
1

answered Apr 27 '12 at 21:13

Python faster than compiled Haskell?

7 Answers7

Linked

Related