10

I've been doing some experiments and here's something I found. Consider the following C program:

int main()
{
   for(int i = 0;
       i < 1000000;
       ++i)
   {}
}

and the following Haskell program:

import System.IO

loop :: Int -> IO ()
loop n = if 0 == n then return () else loop (n-1)

main = loop 1000000

Here is the output of time for the above C program:

real    0m0.003s
user    0m0.000s
sys 0m0.000s

...and for the Haskell program:

real    0m0.028s
user    0m0.027s
sys 0m0.000s

At first I thought that gcc detected an empty loop and optimized it out, but after increasing the number of iterations, the running time of the program also increased. Here are the outputs of time for both of the programs, with the number of iterations set to 10000000:

C version

real    0m0.024s
user    0m0.023s
sys 0m0.000s

Haskell version

real    0m0.245s
user    0m0.247s
sys 0m0.000s

As you can see, the Haskell program is 10 times slower.

Question is: what is the efficient alternative to the for loop in Haskell? As we just saw, simple recursion slows the program down about 10 times (and this is probably with the tail recursion optimizations done).

Don Stewart
  • 137,316
  • 36
  • 365
  • 468
  • 1
    Out of curiosity, does the assembly code produced from the optimized c program completely remove the loop? – shuttle87 May 08 '11 at 06:47
  • Pfft, the `while` loop. Duh. :P – Chris Eberle May 08 '11 at 06:47
  • @shuttle87 with -O3 flag the loop gets completely removed, without it it does not. –  May 08 '11 at 06:53
  • 21
    This apparent "benchmark" is a farce, sorry to say. Doing nothing in a loop is a totally non-existent use case. The idea of a `for` loop in functional programming really doesn't make sense. The approach you take to iterate over elements to perform a computation will vary greatly depending on what that computation is. Pick a real use case and you'll probably get a better and more realistic answer. – OJ. May 08 '11 at 06:54
  • What compilation flags did you use for each language? – shang May 08 '11 at 06:55
  • 4
    Lies, damn lies, and benchmarks. – Nick ODell May 08 '11 at 06:57
  • @shang for C I used only -std=c99, for Haskell - none. –  May 08 '11 at 06:59
  • 1
    @Grigory Javadyan, assuming you're using the GHC compiler, you'll want to turn on optimizations, if you want anything done to the program. E.g. `ghc -O2 -fllvm` for performance-oriented work. If your goal is to watch a program sit and spin, write a loop and run it. – Don Stewart May 08 '11 at 07:40
  • I think the `if then else` doesn't introduce any friction here. Most possibly you should still be able to use the above approach but i suppose compiler optimization flags (such as -O2 and -fllvm or -fasm) will give a speed up. [Besides it's mentioned that using unboxed types in a similar loop will make it run as fast as it's C equivalent](https://wiki.haskell.org/Performance/GHC#Unboxed_types). – Redu Jan 07 '19 at 23:09

2 Answers2

23

Firstly, you'd translate your C code to this,

main = go 0
    where
        go :: Int -> IO ()
        go i | i < 1000000 = go (i+1)
             | otherwise   = return ()

which ghc optimizes to the empty program. It moves the final value into a register, compares against it, and returns ():

Main_zdwa_info: 
    cmpq    $1000000, %r14          # imm = 0xF4240
    movl    $1000000, %eax          # imm = 0xF4240
    cmovlq  %rax, %r14
    movq    (%rbp), %rax
    movl    $ghczmprim_GHCziUnit_Z0T_closure+1, %ebx
    jmpq    *%rax  # TAILCALL

which when run:

$ time ./A
./A  0.00s user 0.00s system 88% cpu 0.004 total

takes no time.


In general, though, GHC emits equivalent loops to e.g. GCC, for tail-recursive functions. In particular, you'll want to compile your numeric benchmarks with ghc -O2 -fllvm for best results. If you don't want your programs optimized, then ghc will happily execute exactly the program you specify, which, in this case, involves lots of redundant work that would be removed at higher optimization levels.

For more information on analyzing low-level performance of Haskell programs, check RWH ch25.

Community
  • 1
  • 1
Don Stewart
  • 137,316
  • 36
  • 365
  • 468
6

For looping constructs, you often have to write in the Worker/Wrapper style to help GHC spot optimizations rather than recur at the outer function level.

Grigory Javadyan has said in a comment that the original version gets optimized out at -O3, I'd expect this version gets detected at -O2:

import System.IO

loop :: Int -> IO ()
loop n = go n
  where
    go n | n <= 0 = return ()
    go n          = go (n-1)

main = loop 1000000
Eric Seppanen
  • 5,923
  • 30
  • 24
stephen tetley
  • 4,465
  • 16
  • 18