2

I am struggling to understand why the following attempts to find a minimum element in STArray lead to stack space overflow when compiled with ghc (7.4.1, regardless of -O level), but work fine in ghci:

import Control.Monad
import Control.Monad.ST
import Control.Applicative
import Data.Array.ST

n = 1000 :: Int

minElem = runST $ do
  arr <- newArray ((1,1),(n,n)) 0 :: ST s (STArray s (Int,Int) Int)
  let ixs = [(i,j) | i <- [1..n], j <- [1..n]]
  forM_ ixs $ \(i,j) -> writeArray arr (i,j) (i*j `mod` 7927)
--  readArray arr (34,56)  -- this works OK
--  findMin1 arr           -- stackoverflows when compiled
  findMin2 arr           -- stackoverflows when compiled

findMin1 arr = do
  es <- getElems arr
  return $ minimum es

findMin2 arr = do
  e11 <- readArray arr (1,1) 
  foldM (\m ij -> min m <$> readArray arr ij) e11 ixs
      where ixs = [(i,j) | i <- [1..n], j <- [1..n]]

main = print minElem

Note: switching to STUArray or ST.Lazy doesn't seem to have any effect.

The main question though: What would be the proper way to implement such "fold-like" operation over big STArray while inside ST?

Ed'ka
  • 6,595
  • 29
  • 30

3 Answers3

5

That's probably a result of getElems being a bad idea. In this case an array is a bad idea altogether:

minimum (zipWith (\x y -> (x, y, mod (x*y) 7927)) [1..1000] [1..1000])

This one gives you the answer right away: (1, 1, 1).

If you want to use an array anyway I recommend converting the array to an Array or UArray first and then using elems or assocs on that one. This has no additional cost, if you do it using runSTArray or runSTUArray.

ertes
  • 4,430
  • 1
  • 18
  • 23
  • Of course the actual computation is more complicated then `mod`. The point was to do computation in `ST` and then find the element satisfying some condition in the resulting array why still in `ST` (without) `freeze`. So the question stays: can we find a minimum element in STArray without converting it to `Array` or `UArray`. – Ed'ka Jan 23 '13 at 05:33
  • 3
    You seem to be confusing the purpose of ST. It's not to make computation faster (likely even slower), but to allow imperative constructs within pure code. An algorithm in pure code will always outperform the same algorithm in ST. But there are algorithms for which you *need* ST. In this case (and most others) you don't. – ertes Jan 23 '13 at 06:23
  • 1
    "An algorithm in pure code will always outperform the same algorithm in ST" <- Umm, you mean "an algorithm that wouldn't benefit from in-place mutation", don't you? – Daniel Fischer Jan 23 '13 at 15:44
  • @Daniel: Of course. The algorithm to search for the minimum in an unordered list is a full traversal. This does not benefit from in-place update, so a pure version is faster, because it gets along without the monadic binding. Of course the implementation of `ST` might make binding cheap, but not free. – ertes Jan 28 '13 at 07:38
  • Yes, for the problem as it stands, it's not the right tool (Number Theory is). I was just taking issue with your formulation that read overly broad. – Daniel Fischer Jan 28 '13 at 07:58
3

The big problem in findMin1 is getElems:

getElems :: (MArray a e m, Ix i) => a i e -> m [e]
getElems marr = do
  (_l, _u) <- getBounds marr      -- Hmm, why is that there?
  n <- getNumElements marr
  sequence [unsafeRead marr i | i <- [0 .. n - 1]]

Using sequence on a long list is a common cause for stack overflows in monads whose (>>=) isn't lazy enough, since

sequence ms = foldr k (return []) ms
        where
          k m m' = do { x <- m; xs <- m'; return (x:xs) }

then it has to build a thunk of size proportional to the length of the list. getElems would work with the Control.Monad.ST.Lazy, but then the filling of the array with

forM_ ixs $ \(i,j) -> writeArray arr (i,j) (i*j `mod` 7927)

creates a huge thunk that overflows the stack. For the strict ST variant, you need to replace getElems with something that builds the list without sequence or - much better - compute the minimum without creating a list of elements at all. For the lazy ST variant, you need to ensure that the filling of the array doesn't build a huge thunk e.g. by forcing the result of the writeArray calls

let fill i j
      | i > n     = return ()
      | j > n     = fill (i+1) 1
      | otherwise = do
          () <- writeArray arr (i,j) $ (i*j `mod` 7927)
          fill i (j+1)
() <- fill 1 1

The problem in findMin2 is that

foldM (\m ij -> min m <$> readArray arr ij) e11 ixs

is lazy in m, so it builds a huge thunk to compute the minimum. You can easily fix that by using seq (or a bang-pattern) to make it strict in m.

The main question though: What would be the proper way to implement such "fold-like" operation over big STArray while inside ST?

Usually, you'll use the strict ST variant (and for types like Int, you should almost certainly use STUArrays instead of STArrays). Then the most important rule is that your functions be strict enough. The structure of findMin2 is okay, the implementation is just too lazy.

If performance matters, you will often have to avoid the generic higher order functions like foldM and write your own loops to avoid allocating lists and control strictness exactly as the problem at hand requires.

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
  • As usual, great answer, Daniel! Any comments on why `findMin1` and `findMin2` produce the answer in `ghci`? – Ed'ka Jan 25 '13 at 04:21
  • 1
    Oh, sorry, forgot to explain that: ghci uses an unlimited stack (since 6.10 or 6.12, iirc), so it handles huge thunks as long as they're not too big for the RAM. Compiled code runs with an 8MB stack by default. – Daniel Fischer Jan 25 '13 at 05:32
0

The problem is that minimum is a non-strict fold, so it is causing a thunk buildup. Use (foldl' min).

Now we add a bunch of verbiage to ignore because stackoverflow has turned worthless and no longer allows posting a straightforward answer.

none
  • 1