19

Possible Duplicate:
Small Haskell program compiled with GHC into huge binary

Recently I noticed how large Haskell executables are. Everything below was compiled on GHC 7.4.1 with -O2 on Linux.

  1. Hello World (main = putStrLn "Hello World!") is over 800 KiB. Running strip over it reduces the filesize to 500 KiB; even adding -dynamic to the compilation doesn't help much, leaving me with a stripped executable around 400 KiB.

  2. Compiling a very primitive example involving Parsec yields a 1.7 MiB file.

    -- File: test.hs
    import qualified Text.ParserCombinators.Parsec as P
    import Data.Either (either)
    
    -- Parses a string of type "x y" to the tuple (x,y).
    testParser :: P.Parser (Char, Char)
    testParser = do
        a <- P.anyChar
        P.char ' '
        b <- P.anyChar
        return (a, b)
    
    -- Parse, print result.
    str = "1 2"
    main = print $ either (error . show) id . P.parse    testParser "" $ str
    -- Output: ('1','2')
    

    Parsec may be a larger library, but I'm only using a tiny subset of it, and indeed the optimized core code generated by the above is dramatically smaller than the executable:

    $ ghc -O2 -ddump-simpl -fforce-recomp test.hs | wc -c
    49190 (bytes)
    

    Therefore, it's not the case that a huge amount of Parsec is actually found in the program, which was my initial assumption.

Why are the executables of such an enormous size? Is there something I can do about it (except dynamic linking)?

Community
  • 1
  • 1
David
  • 8,275
  • 5
  • 26
  • 36
  • @DanielWagner The other question is certainly related, but even using the techniques described there Hello World is still huge. Also: why does small core code, which should contain the entire program, get so large when compiled? – David Oct 04 '12 at 02:00
  • 2
    There's a rather large runtime system. – augustss Oct 04 '12 at 02:42
  • 2
    @David: The core does not contain the entire program unless everything got inlined, which is rather unlikely. So it's going to link in Parsec, and unless you built that with `-split-objs` (see [related answer](http://stackoverflow.com/a/9198223/98117)), it'll have to link in all of it. – hammar Oct 04 '12 at 03:42
  • As a reference your primitive example produced 29 KiB *"big"* executable on my system. `ghc -O2 -dynamic test.hs && strip test && du -b test` => 28712 bytes . GHC version 7.4.2, x86_64 Linux system. – David Unric Oct 04 '12 at 09:49

2 Answers2

14

To effectively reduce size of the executable produced by Glasgow Haskell Compiler you have to focus on

  • use of dynamic linking with -dynamic option passed to ghc so modules code won't get bundled into the final executable by utilizing of shared(dynamic) libraries. The existence of shared versions of these GHC's libraries in the system is required !
  • removing debugging informations of the final executable (f.E. by strip tool of GNU's binutils)
  • removing imports of unused modules (don't expect gains at dynamic linking)

The simple hello world example has the final size 9 KiB and Parsec test about 28 KiB (both 64 bit Linux executables) which I find quite small and acceptable for such a high level language implementation.

David Unric
  • 7,421
  • 1
  • 37
  • 65
  • Hello World is only 9 KiB if I link with `-dynamic`. In the Parsec case I've got problems installing the dynamic version (`cabal install parsec --enable-shared --reinstall` results in cabal complaining that I don't have "dyn libraries for package `mtl-2.1.1'", but that would make another question. In any case, thank you. – David Oct 04 '12 at 22:16
5

My understanding is that if you use a single function from package X, the entire package gets statically linked in. I don't think GHC actually links function-by-function. (Unless you use the "split objects" hack, which "tends to freak the linker out".)

But if you're linking dynamically, that ought to fix this. So I'm not sure what to suggest here...

(I'm pretty sure I saw a blog post when dynamic linking first came out, demonstrating Hello World compiled to a 2KB binary. Obviously I cannot find this blog post now... grr.)

Consider also cross-module optimisation. If you're writing a Parsec parser, it's likely that GHC will inline all the parser definitions and simplify them down to the most efficient code. And, sure enough, your few lines of Haskell have produced 50KB of Core. Should that get 37x bigger when compiling to machine-code? I don't know. You could perhaps try looking at the STG and Cmm code produced in the next steps. (Sorry, I don't recall the compiler flags off the top of my head...)

MathematicalOrchid
  • 61,854
  • 19
  • 123
  • 220
  • That's not actually the case. It depends on the system. On most systems with static linking GHC uses "split objects", so that you get one object per function. – Don Stewart Oct 04 '12 at 12:02
  • @DonStewart But you need to enable split-objs in the cabal config to get your cabal-installed libraries built with split objects, don't you? – Daniel Fischer Oct 04 '12 at 13:00