Haskell performance of memoization implement of dynamic programming is poor

Question

Problem starts from a algorithm problem from codeforces, here is the link Writing Code. The problem is a basic DP problem which solution could be found at Solution Report.

I try to solve it using Haksell, my solution is recurse with memoization, which time complexity is also O(n^3). Here is my code:

{-# OPTIONS_GHC -O2 -funbox-strict-fields #-}
{-# LANGUAGE MultiParamTypeClasses, FlexibleInstances, BangPatterns #-}

import Data.Functor
import Data.List

class Memo m a where
    memo :: m -> a

instance Memo m m where
    memo = id

instance Memo m a => Memo (Int -> m) [a] where
    memo f = map (memo.f) [0..]

solve :: Int -> Int -> Int -> Int -> [Int] -> Int
solve n m b mod arr = modsum.map (go n m) $ [0 .. b]
    where mods = (`rem` mod)
          modsum = foldl1' (\a b -> mods (a + b))
          go :: Int -> Int -> Int -> Int
          go _ 0 0 = 1
          go _ 0 _ = 0
          go 0 _ _ = 0
          go i j k = mods $ a + b
            where !id = i - 1
                  !ai = arr !! id
                  !a = (if k >= ai then memo_go !! i !! (j - 1) !! (k - ai) else 0)
                  !b = memo_go !! id !! j !! k
          memo_go = memo go


main = do [n, m, b, mod] <- map read.words <$> getLine
          arr <- map read.words <$> getLine
          print $ solve n m b mod arr

And the performance report of this program using RTS is:

    Fri May 22 12:25 2015 Time and Allocation Profiling Report  (Final)

       test.exe +RTS -p -RTS

    total time  =        5.75 secs   (5751 ticks @ 1000 us, 1 processor)
    total alloc = 189,957,112 bytes  (excludes profiling overheads)

COST CENTRE MODULE  %time %alloc

solve.go    Main     97.2    8.5
memo        Main      2.3   82.8
solve.mods  Main      0.5    8.5


                                                                  individual     inherited
COST CENTRE          MODULE                     no.     entries  %time %alloc   %time %alloc

MAIN                 MAIN                        52           0    0.0    0.0   100.0  100.0
 main                Main                       105           0    0.0    0.3   100.0  100.0
  solve.modsum       Main                       108           1    0.0    0.0     0.0    0.0
  solve              Main                       106           1    0.0    0.0   100.0   99.7
   solve.memo_go     Main                       114           1    0.0    0.0    99.9   99.7
    memo             Main                       115       10301    2.3   82.8    99.9   99.7
     solve.go        Main                       118     1024999   97.2    8.5    97.6   16.9
      solve.mods     Main                       119           0    0.5    8.5     0.5    8.5
   solve.go          Main                       113         101    0.0    0.0     0.0    0.0
    solve.mods       Main                       120           0    0.0    0.0     0.0    0.0
   solve.mods        Main                       111           1    0.0    0.0     0.0    0.0
   solve.modsum      Main                       109           0    0.0    0.0     0.0    0.0
    solve.modsum.\   Main                       110         100    0.0    0.0     0.0    0.0
     solve.mods      Main                       112           0    0.0    0.0     0.0    0.0
 CAF                 GHC.IO.Encoding.CodePage    91           0    0.0    0.0     0.0    0.0
 CAF                 GHC.IO.Encoding             84           0    0.0    0.0     0.0    0.0
 CAF                 Text.Read.Lex               81           0    0.0    0.0     0.0    0.0
 CAF                 GHC.IO.Handle.FD            72           0    0.0    0.0     0.0    0.0
 CAF:main1           Main                        68           0    0.0    0.0     0.0    0.0
  main               Main                       104           1    0.0    0.0     0.0    0.0
 CAF:lvl2_r3ul       Main                        64           0    0.0    0.0     0.0    0.0
 CAF:$fMemo(->)[]1   Main                        61           0    0.0    0.0     0.0    0.0
  memo               Main                       116           0    0.0    0.0     0.0    0.0
 CAF:$fMemomm_$cmemo Main                        59           0    0.0    0.0     0.0    0.0
  memo               Main                       117           1    0.0    0.0     0.0    0.0

Predictably, the bottleneck is the memo_go. The memo_go is a 3-D array, in which every element is a thunk go i j k, and it would only be evaluated once. However, this code got a TLE from the judge.

I wander how to improve the performance of this code?

Updata I change the List to Array and got an MLE, so I use rolling-array to reduce the memory, here is my code:

{-# OPTIONS_GHC -O2 -funbox-strict-fields #-}
{-# LANGUAGE MultiParamTypeClasses, FlexibleInstances, BangPatterns #-}

import Data.Functor
import Data.List (foldl', foldl1')
import Data.Array
--import qualified  Data.Vector as V

size = 510

class Memo m a where
    memo :: m -> a

instance Memo m m where
    memo = id

instance Memo m a => Memo (Int -> m) (Array Int a) where
    memo f = fmap (memo.f) $ listArray (0, size) [0..size]

solve :: Int -> Int -> Int -> Int -> [Int] -> Int
solve n m b mod lis = modsum.map (res ! m !) $ [0 .. b]
    where arr = listArray (0, n - 1) lis
          mods = (`rem` mod)
          modsum = foldl1' (\a b -> mods (a + b))
          init :: Int -> Int -> Int
          init 0 0 = 1
          init _ _ = 0
          arr_init = memo init
          dp n = foldl' build arr_init [1..n]
            where build ar idx = memo_go'
                    where !ai = arr ! (idx - 1)
                          memo_go' = memo go'
                          go' i j = mods $ a + b
                            where !a = ar ! i ! j
                                  !b = if i > 0 && j >= ai then memo_go' ! (i - 1) ! (j - ai) else 0
          res = dp n

main = do [n, m, b, mod] <- map read.words <$> getLine
          arr <- map read.words <$> getLine
          print $ solve n m b mod arr

Still MLE, maybe I should use the mutable data structrue :(

I think a big step would be if you did not use lists and `(!!)` (which is painfully slow) - [vector](https://hackage.haskell.org/package/vector) would come to mind — Random Dev, May 22 '15 at 05:15
@Carsten Yes, using vector improve a lot. Unfortunately, the vector package is not installed on the OJ server:( — Liao Pengyu, May 22 '15 at 06:06
@LiaoPengyu: what about [array](https://downloads.haskell.org/~ghc/latest/docs/html/libraries/Data-Array.html)? It's shipped with GHC. — András Kovács, May 22 '15 at 06:36
yeah some algorithms are harder to translate - I guess MLE is out-of-memory exception? Do you know the DP technique where you basically reuse columns to generate the last table-cell? You don't have to mutate those (although it's even faster) - you can just transform theses FP style and it should help you do (you will have a bit more GCs thats all) — Random Dev, May 22 '15 at 09:13
@Carsten My question has update, the last version code use rolling-array to reduce memory, and as you said, that may depends on the GC if I insist use immutable data structure. — Liao Pengyu, May 22 '15 at 09:24
I don't know - I had a quick glance at it and for me it looks like you still initialize a full array for all `n` columns here `listArray (0, n - 1) lis` (?) (it's kindof hard to read actually - sorry) — Random Dev, May 22 '15 at 09:27
@Carsten the `lis` is a n-size (not exceeding 500) `List`, `listArray (0, n - 1) lis` just translate the `List` to `Array` for indexing performance. The neckbottle is the `dp n`, which product a new 500*500 Array from previous 500*500 Array (which should be freed by GC after `build`), the n indict the times of such a product-process. — Liao Pengyu, May 22 '15 at 09:36
Reinventing the wheel aren't we? Look at [**`Control.Monad.Memo`**](https://hackage.haskell.org/package/monad-memo-0.3.0/docs/Control-Monad-Memo.html) — recursion.ninja, May 22 '15 at 19:11

Haskell performance of memoization implement of dynamic programming is poor

0 Answers0