9

I've got the following program that takes a big input (a list of extension/mime mapping, a list of files) and output results line by line (the mime type for each file).

import System.IO
import Control.Monad
import qualified Data.Map as M
import System.FilePath
import Data.Char

main :: IO ()
main = do
    input_line <- getLine
    let n = read input_line :: Int -- Number of elements which make up the association table.
    input_line <- getLine
    let q = read input_line :: Int -- Number Q of file names to be analyzed.

    mimeMap <- fmap M.fromList $ replicateM n $ do
        input_line <- getLine
        let input = words input_line
        let ext = input!!0 -- file extension
        let mt = input!!1 -- MIME type.
        return (map toLower ext, mt)

    replicateM_ q $ do
        fname <- getLine
        let ext = map toLower . drop 1 . takeExtension $ fname
            mime = M.findWithDefault "UNKNOWN" ext mimeMap
        putStrLn mime

The program was quite slow, so I started profiling it, and I got a strange result.

When compiled with

ghc --make -O2 coding.hs

the program is very slow. However, the -fprof-auto seems to speed it all up. Compiled with

ghc --make -O2 coding.hs -prof -fprof-auto -fforce-recomp

makes it blazing fast -prof alone has no effect.

Strangely, it is also very fast when run with runghc coding.hs.

I have no idea in what direction to go from there. Does anyone understand what is happenning here?

EDIT: I should mention that my ghc is 7.10.1.

madjar
  • 12,691
  • 2
  • 44
  • 52
  • 3
    Probably some optimization enabled by `-O2` is going awry, but is blocked by the profiling annotations added by `-fprof-auto`. Specifically GHC might have decided that `mimeMap` is used only once and moved the `M.fromList` into the second loop. Try building with `-fno-state-hack`. – Reid Barton May 09 '15 at 15:16
  • I can't reproduce the slow behavior. How big are `n` and `q` in your situation? – Lynn May 09 '15 at 15:37
  • 2
    [This ghc ticket](https://ghc.haskell.org/trac/ghc/ticket/1957) is also about IO code that runs slower with optimizations on, and gets sped up significantly by compiling with `-fno-state-hack` like Reid Barton mentioned. There's some more discussion about this on [the mailing list](http://comments.gmane.org/gmane.comp.lang.haskell.glasgow.user/13941). – Lynn May 09 '15 at 15:44
  • `n` and `q` are 9999 each. The problematic input can be found there: http://www.codingame.com/ide/fileservlet?id=1632865475298. Sadly, `-fno-state-hack` doesn't seem to fix it. However, building without optimization does indeed make it much faster. – madjar May 09 '15 at 22:23
  • Also, strictly evaluating mimeMap with a bang pattern prevents the slowdown. – madjar May 09 '15 at 22:32
  • Could you include your code that does file reading? (instead of stdin reading, as we see here). – András Kovács May 10 '15 at 05:26
  • (I used `openFile` and changed the `getLine`-s to `hGetLine handle` and indeed I can reproduce the slowdown beautifully, even on ghc 7.8.4). – András Kovács May 10 '15 at 05:32
  • @AndrásKovács forgive me if I misunderstand, but this is the code I use. I pass the file directly to stdin by calling it with `./coding Test_5_input.txt`. – madjar May 10 '15 at 09:09
  • Thanks for including the input file. `-fno-state-hack` did fix it for me, are you sure you recompiled? – Reid Barton May 11 '15 at 03:06
  • Oops, it fixed it in 7.8 but not 7.10...! – Reid Barton May 11 '15 at 03:07

1 Answers1

2

To provide a complete answer to the question:

As Reid Barton mentioned, the problem seems to be the infamous state hack optimization, which inlines mimeMap into the repeated IO action, executing it much more times than necessary. -fno-state-hack disables that optimization and solves the problem. Another way to solve the problem is to force a strict evaluation of ``mimeMap.

!mimeMap <- fmap M.fromList $ replicateM n [...]

However, there also seems to be a regression in GHC 7.10, in which -fno-state-hack does not solves the problem. That explains why it didn't fix it for me.

Thanks a lot everyone for your answers.

madjar
  • 12,691
  • 2
  • 44
  • 52