1

Recently me and my friends were benchmarking different programming languages. As I'm learning Haskell lately, I wanted to show that a functional language will perform nearly as good as C with simpler code. But the code I pasted below, compiled with GHC's -O3 option, was executing about 1.6 s on my machine. Equivalent scripts in Python and Ruby executed faster (it was a simple for loop).

import System.IO

saveLine fh x = hPutStrLn fh $ show x ++ "\t" ++ show (x^2)

main = do
    fh <- openFile "haskell.txt" WriteMode
    mapM (saveLine fh) [1..999999]
    hClose fh

You can check out codes in other languages here (I wrote only the ones in Python and Ruby).

The question is - how to make it run faster?

Luke
  • 1,369
  • 1
  • 13
  • 37
  • 2
    Hi, welcome to StackOverflow. What exactly is your question? – Cthulhu Jul 14 '15 at 09:40
  • Hi :) Sorry - my question is how to make it run faster. – Luke Jul 14 '15 at 09:43
  • It's 0.8 sec for your Haskell and 1.2 sec for your Python script on my machine. – András Kovács Jul 14 '15 at 09:48
  • I have a slow hardware here and it takes 1.6 sec for your Haskell script and 2 sec for your Python 2.7 script. – Sibi Jul 14 '15 at 10:02
  • 1
    Benchmarks like these aren't that meaningful, in my opinion. – duffymo Jul 14 '15 at 10:15
  • I've never said that they are. I was just a little surprising for me. I expected compiled Haskell to run two times faster than scripts. (BTW. Running this code with `runhaskell` takes ages.) – Luke Jul 14 '15 at 10:44
  • Related: http://stackoverflow.com/questions/29186541/why-is-this-haskell-program-so-much-slower-than-an-equivalent-python-one – jub0bs Jul 14 '15 at 10:59
  • @Luke `runhaskell` or rather `runghc` compiles the program first, and since we don't have a fast interpreter, it will be perhaps the slowest solution you can come up with, so the comparison isn't fair. You should compile your program with `ghc -O2 yourfile.hs` and then time the resulting executable instead. – Sarah Jul 14 '15 at 12:05
  • @Sarah: I know, it was only a minor point. I was comparing an executable with Python and Ruby scripts. – Luke Jul 14 '15 at 12:46

3 Answers3

10

First off, benchmarks where you perform some simple calculation and write each result to a file are only useful to test the IO system. They say nothing at all about computation efficiency, because most of the time is spent with file accesses. If you want to compare how fast languages are for calculating things, you need to do something processor-intensive (e.g. summing all those square values) and then output only that single result.

If you actually want to compare file-output performance, the main problem in Haskell is its String type. It's a simple list of characters, handy for doing simple text manipulations by hand but with ridiculous overhead for standard tasks. For faster performance, use bytestrings or Text. Specifically, you need a faster alternative to show. Check out the text-format package.

leftaroundabout
  • 117,950
  • 5
  • 174
  • 319
  • Thanks, that might be the root of the problem. I've heard that GHC produces only a bit slower (less than two times) code than gcc, so I expected the same would apply to strings and file IO. Can something be done using only standard library? – Luke Jul 14 '15 at 10:35
  • 1
    @Luke: you could probably do it by manual juggling with `putChar` and buffer settings... but that's awful. Again: you shouldn't really need to care about IO performance, unless you're working on a big project that handles lots of network activity or data too big to fit in RAM. For those projects, you definitely want bytestrings / streaming libraries. For anything simpler, just _don't output stuff you won't need anyway_! That's not just Haskell advice; in any language, needless IO will greatly diminish performance. – leftaroundabout Jul 14 '15 at 10:53
  • 1
    And BTW, for high-performance IO you should use binary formats, not text. – leftaroundabout Jul 14 '15 at 10:59
  • Sure, but then it wouldn't be readable outside hex editor or my own program. Could you recommend a source introducing to binary access to file system in Haskell? – Luke Jul 14 '15 at 11:43
  • 1
    How about https://wiki.haskell.org/Dealing_with_binary_data. Of course, Real World Haskell also [has a chapter on it](http://book.realworldhaskell.org/read/code-case-study-parsing-a-binary-data-format.html). – leftaroundabout Jul 14 '15 at 12:31
3

The problems are: 1) using String, and 2) the formatting of integer values to character data (either ByteString or String).

{-# LANGUAGE OverloadedStrings #-}

import System.IO
import qualified Data.ByteString.Char8 as BS

-- saveLine fh x = hPutStrLn fh $ show x ++ "\t" ++ show (x^2) -- original version
-- saveLine fh x = hPutStrLn fh "123456\t123456789012" -- String version
saveLine fh x = BS.hPutStrLn fh "123456\t123456789012" -- ByteString version

main = do
    fh <- openFile "haskell.txt" WriteMode
    mapM_ (saveLine fh) [1..999999]
    hClose fh

The run times on my machine are (GHC 7.8.3):

original version:   1.6 secs
String version:     0.7 secs
ByteString version: 0.3 secs

Thus, the run time can be significantly improved if one can find a faster way to format integer values to ByteStrings.

ErikR
  • 51,541
  • 9
  • 73
  • 124
  • I see that `ByteString` version is much faster, but how can I easily convert an `Int` to `ByteString`, without using non-standard libraries? Currently your solution is faster, but it doesn't solve the main problem (which is saving integers in the file). – Luke Jul 14 '15 at 13:22
  • That's the rub - I don't know of any standard, faster ways of formatting integers to strings or byte strings. Note - my program does save data to a file - it just doesn't save the right data. I wrote it to demonstrate where the real problem is. – ErikR Jul 14 '15 at 13:45
  • 2
    @Luke: `ByteString` is itself not in `base`, so obviously you can't convert to it without using “nonstandard libraries”! However, a lot of libraries (`vector`, `bytestring`, `lens`) are so common you may well consider them standard. And really... it's no big deal to use othe hackage libraries, either. Get used to it. – leftaroundabout Jul 14 '15 at 15:00
  • 1
    @Luke the easiest way, if you don't want to use other packages, is to use `pack` function from `ByteString` to convert from string, as I've in my answer. – Marqin Jul 14 '15 at 20:37
1

First, you should avoid using -O3, because it could sometimes break your code ( even in gcc ). Just stick to -O2.


Second, Haskell Strings are really slow.

But, there are faster alternatives - and for me, the easiest one is to use is Data.ByteString ( little and old tutorial ).


Third, if you don't need Maybe results from mapM use mapM_ - it's much faster!


Finally, here is my version of your code, using Data.ByteString:

{-# LANGUAGE OverloadedStrings #-}
--  that line is so You could use "\t" instead of T.pack "\t"

import System.IO
import qualified Data.ByteString.Char8 as T

helperFun x = T.concat [y, "\t", z]
          where
            y = T.pack $ show x
            z = T.pack $ show $ x^2

saveLine fh x = T.hPutStrLn fh $ helperFun x

main = do
    fh <- openFile "haskell.txt" WriteMode
    mapM_ (saveLine fh) [1..999999]
    hClose fh

Edit: user5402 said in comments it's not so "much faster" then Strings version.

You should keep in mind that this code is still using Strings, and had additional overhead ( packing Strings into ByteString ) with only concat and hPutStrLn used from ByteString.

Community
  • 1
  • 1
Marqin
  • 1,096
  • 8
  • 17
  • You should benchmark this against the `String` version, and I think you'll find that it is only about 20% faster (using GHC 7.8.3). – ErikR Jul 15 '15 at 04:44
  • 1
    In fact, I think using `mapM_` instead of `mapM` accounts for most of the timing difference between the two versions. – ErikR Jul 15 '15 at 04:48
  • @user5402 Which means that it wasn't only the `String`'s fault. – Luke Jul 15 '15 at 07:38
  • @user5402 You should keep in mind that this code is still using Strings, and had additional overhead ( packing Strings into ByteString ) with only concat and hPutStrLn used from ByteString. – Marqin Jul 15 '15 at 11:04