Haskell List Comprehension Speed Inconsistencies

Question

I'm trying to optimize the execution speed of my program, and I ran into some interesting results that I'm hoping someone can answer. It seems that making small changes in one of my list comprehensions drastically changes the execution speed, but I don't know why.

Here's my program as it is right now.

import Data.Ord
import Control.Monad
import Data.Array
import Data.Ix
import qualified Data.Map as M
import qualified Data.Set as S
import Data.List (minimumBy, foldl')


arrayMatrix lists = let rlen = length lists
                        clen = length $ head lists
                        r    = ((1,1), (rlen, clen))
                    in array r . zip (range r) $ concat lists

a_star start goal h m = search S.empty (S.singleton start) 
                        (M.singleton start (m ! start)) 
                        $ M.singleton start (m ! start + h ! start)
    where neighbors (r,c) = filter (inRange $ bounds m) [ (r-1,c), (r,c+1), (r+1,c) , (r,c-1)]
          search closed open gs fs
              | S.null open     = 0
              | current == goal = gs M.! goal
              | otherwise       = let open'   = S.delete current open
                                      closed' = S.insert current closed
                                      neighbs = [(n, ts) | n <- neighbors current, S.notMember n closed
                                                , let ts = gs M.! current + m ! n ]
                                      actionable = filter (\(n,ts) -> S.notMember n open' || ts < (gs M.! n)) neighbs
                                      (op',gs',fs') = foldl' (\(o,ng,nf) (n,ts) -> (S.insert n o, M.insert n ts ng, M.insert n (ts + h ! n) nf)) (open',gs,fs) actionable
                                  in search closed' op' gs' fs'
              where current = minimumBy (comparing (fs M.!)) $ S.toList open

main = do
  matrix <- liftM (arrayMatrix . map (read . ('[':) . (++"]")) . lines) 
            $ readFile "matrix.txt"
  let bds       = bounds matrix
      ulim      = snd bds
      heuristic = let m   = minimum $ elems matrix
                    in listArray bds . map (\(r,c) -> (uncurry (+) ulim)-r-c) $ range bds
  print $ a_star (1,1) ulim heuristic matrix

Right now the program runs on my computer ~350ms (compiled with GHC 7.8.2 -O2) with the matrix.txt supplied by Project Euler.

If I change neighbs from

neighbs = [(n, ts) | n <- neighbors current, S.notMember n closed
          , let ts = gs M.! current + m ! n ]

to

neighbs = [(n, gs M.! current + m ! n) | n <- neighbors current, S.notMember n closed]

the execution time increases to over 1sec.
Other minor changes like moving the filter on the next line into the list comprehension yields the same result: ~1sec.
Can anyone explain why this happens?

EDIT: It seems this doesn't happen on earlier versions of GHC. I tried GHC 7.6.3 and each of these performed about the same.

I've included the dumps from running ghc -O2 -ddump-simpl -dsuppress-all as suggested by cdk. I don't really know what I'm looking at, so if anyone is able to interpret, that would be a big help, thanks.

Link to both dumps

EDIT2 (Response to Priyatham): I don't think that's the case. I changed

neighbs = [(n, ts) | n <- neighbors current, S.notMember n closed
          , let ts = gs M.! current + m ! n ]
actionable = filter ((n,ts) -> S.notMember n open' || ts < (gs M.! n)) neighbs

to

neighbs = [(n, gs M.! current + m ! n) | n <- neighbors current, S.notMember n closed ]
actionable = filter ((n,!ts) -> S.notMember n open' || ts < (gs M.! n)) neighbs

using BangPatterns, and that still runs at a little over a second. In fact, modifying neigbs from

neighbs = [(n, ts) | n <- neighbors current, S.notMember n closed
          , let ts = gs M.! current + m ! n ]

to

neighbs = [(n, ts) | n <- neighbors current, S.notMember n closed
          , let !ts = gs M.! current + m ! n ]  -- Added bang before ts

increases the runtime to over 1sec as well.

I’m trying to reproduce it. Can you paste your `matrix.txt` somewhere? — Joachim Breitner, Jul 11 '14 at 10:27
Ah forgot about that, [here](http://projecteuler.net/project/matrix.txt) you go. — dsemi, Jul 11 '14 at 12:02
A good exercise would be to inspect the Core GHC generates for each variation of `neighbs` using `ghc -O2 -ddump-simpl -dsuppress-all` — cdk, Jul 11 '14 at 13:06
take a look at the last bit of [this] (http://www.haskell.org/haskellwiki/Let_vs._Where). It seems similar to your example. In that case, the issue is that the map is being recomputed every time, but I couldn't say for certain whether that's what's going on here, and I definitely couldn't explain why. — genisage, Jul 11 '14 at 19:03
@genisage, I can't see how that could be relevant. I also don't know if it's even valid anymore. Since `x` is not free in the let-bound function in that example, and since no computation is required to produce that function, there is no conceivable reason *not* to lift it out, and I would *hope* the compiler is smart enough these days to do so. — dfeuer, Jul 14 '14 at 05:00
Pending a real answer, I've filed a bug report: https://ghc.haskell.org/trac/ghc/ticket/9326 — dfeuer, Jul 17 '14 at 23:40
carter, on Freenode #haskell, suggested I try changing the `let ts = ...` into `let !ts = ...` (with bang patterns enabled); this makes the fast one as slow as the slow one, suggesting maybe the slow one is evaluating something strictly that should be left lazy. Why this would be the case remains rather mysterious to me. — dfeuer, Jul 18 '14 at 01:35
I did the same (shown in Edit2) and same result; strictness in this case is apparently slower. — dsemi, Jul 18 '14 at 02:01
I suspect this is not about the bang at all. You have `neighbs = [A | B]; actionable = filter (\(n,ts) -> P) neighbs`. On my ghc 7.8.2, if I rewrite that to `actionable = [A | B, P]`, with no bangs, (doing the same thing as filter does), I get the same slowdown. This might have something to do with ghc's strictness analyzer, which I don't really understand. — Kirill, Jul 21 '14 at 08:22

Kirill · Accepted Answer · 2014-07-21T09:45:26.773

Here's one guess about what happened with let ts = vs. let !ts =. I got it from looking at the output of -ddump-stranal (which dumps strictness analysis annotations), and reading Demand analyser in GHC.

The difference between let !ts = and let ts = is that if ts is bottom (i.e., undefined), then n will not be evaluated at all because ts will be evaluated first and the evaluation will stop. It appears that the difference between two programs is that the pair of integers n is strict and unboxed in one version, but not in the other (see the output of -ddump-stranal and -ddump-simpl; the link above describes the output).

How can !ts or not !ts affect strictness of n? I think that if ts is bottom then the program must fail before evaluating n or any of its elements (I am not sure if it's n :: (Int, Int) itself or its elements). So ghc seems to do the right thing to keep n non-strict when ts is required to be strict, because evaluating n first and possibly failing in a different place might be a mistake.

Next, how do you force !ts to have no impact on n? Note that ts cannot be bottom without n being bottom if either gs, current, or m are known to not be bottom (these are all the elements of the expression except for n) and have already been evaluated (I think M.! and ! might never be bottom without evaluating their arguments first). So we need to impose the condition "ts is bottom implies n is bottom and has already been evaluated", so that ghc knows that it is safe to evaluate n first.

My solution: add bang patterns to current, gs and m. With my ghc 7.8.2, this seems to solve the problem. It also appears that only current needs to be forced.

I'm not too sure about the original question about moving the expression of ts into the tuple, but the same solution seems to work.

P.S. Note that

filter (\x -> x > 5) [x | x <- [1..10]] == [x | x <- [1..10], x > 5]

so in your lists neighbs and actionable it would be cleaner to bring the filter predicate into the list comprehension itself like so:

[(n, ts)
| n <- neighbors current
, S.notMember n closed
, let ts = gs M.! current + m ! n
, S.notMember n open' || ts < (gs M.! n)
]

I would have brought the predicate into the list comprehension but that causes the same slowdown that I'm trying to figure out. Where exactly in the code did you add the bang patterns? I made current strict at the where statement and saw no effect (Also did you mean m or n when you referenced what you added bangpatterns to?). — dsemi, Jul 22 '14 at 15:05
@dsemi Try adding type annotations to `neighbors` and `search`. You're right that for your original code there no speedup from `where !current =`. This is odd. I added those type annotations as I was trying to make sense of the code, and I forgot they might make a difference, it seems they do. Performance-wise, polymorphic functions are worse (or equal) to specialized functions. — Kirill, Jul 22 '14 at 15:39
The funny thing is, I just ran into this phenomenon recently; I'm berating myself for not trying this earlier. Adding the more specific type annotations than the inferred polymorphic ones drastically improved performance, no bang patterns necessary. I was able to include the predicate in the list comprehension as well without a performance hit, thanks! I'm still fuzzy on my original question, but in light of this information, do you think in one scenario the compiler has better type inference rather than differing strictness rules? — dsemi, Jul 22 '14 at 16:30
@dsemi Well, if ghc can't figure out that a polymorphic functions is mostly/only called on ints, it won't generate specialized code for ints. Or it might generate code that uses boxed ints instead of primitives, which is what I think it did here. Strictness analysis is another thing that might prevent/allow unboxing (because an unboxed value cannot be undefined). I think this is one of the reasons for using the SPECIALIZE pragma on expensive polymorphic functions. — Kirill, Jul 22 '14 at 16:41
TLDR: Add type signatures to your functions. It's good practice, makes code more readable, and in some cases even improves performance. — dsemi, Jul 22 '14 at 17:30

score -2 · Answer 2 · edited May 23 '17 at 11:50

-2

This is not a complete answer as I lack information on how let and list comprehensions are internally implemented.

Each item in neighbs is a tuple and in WHNF, the sum is not evaluated strictly. This leaves unevaluated thunks which could be increasing the runtime.

I suggest re-writing the second definition using seq without using let, if possible to see if runtime falls(in which case this answer is likely to be correct).

Read this to understand what WHNF is.

edited May 23 '17 at 11:50

Community

1
1

answered Jul 13 '14 at 20:30

Priyatham

2,821
1
19
33

4

This does not really look like an answer to me. – dfeuer Jul 14 '14 at 06:15

Haskell List Comprehension Speed Inconsistencies

2 Answers2