How do I memoize?

Question

I have written this function that computes Collatz sequences, and I see wildly varying times of execution depending on the spin I give it. Apparently it is related to something called "memoization", but I have a hard time understanding what it is and how it works, and, unfortunately, the relevant article on HaskellWiki, as well as the papers it links to, have all proven to not be easily surmountable. They discuss intricate details of the relative performance of highly layman-indifferentiable tree constructions, while what I miss must be some very basic, very trivial point that these sources neglect to mention.

This is the code. It is a complete program, ready to be built and executed.

module Main where

import Data.Function
import Data.List (maximumBy)

size :: (Integral a) => a
size = 10 ^ 6

-- Nail the basics.

collatz :: Integral a => a -> a
collatz n | even n = n `div` 2
          | otherwise = n * 3 + 1

recollatz :: Integral a => a -> a
recollatz = fix $ \f x -> if (x /= 1) 
                          then f (collatz x)
                          else x

-- Now, I want to do the counting with a tuple monad.

mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)

remocollatz :: Integral a => a -> ([a], a)
remocollatz = fix $ \f x -> if x /= 1
                            then f =<< mocollatz x
                            else return x

-- Trivialities.

collatzLength :: Integral a => a -> Int
collatzLength x = (length . fst $ (remocollatz x)) + 1

collatzPairs :: Integral a => a -> [(a, Int)]
collatzPairs n = zip [1..n] (collatzLength <$> [1..n])

longestCollatz :: Integral a => a -> (a, Int)
longestCollatz n = maximumBy order $ collatzPairs n
  where
    order :: Ord b => (a, b) -> (a, b) -> Ordering
    order x y = snd x `compare` snd y

main :: IO ()
main = print $ longestCollatz size

With ghc -O2 it takes about 17 seconds, without ghc -O2 -- about 22 seconds to deliver the length and the seed of the longest Collatz sequence starting at any point below size.

Now, if I make these changes:

diff --git a/Main.hs b/Main.hs
index c78ad95..9607fe0 100644
--- a/Main.hs
+++ b/Main.hs
@@ -1,6 +1,7 @@
 module Main where

 import Data.Function
+import qualified Data.Map.Lazy as M
 import Data.List (maximumBy)

 size :: (Integral a) => a
@@ -22,10 +23,15 @@ recollatz = fix $ \f x -> if (x /= 1)
 mocollatz :: Integral b => b -> ([b], b)
 mocollatz n = ([n], collatz n)

-remocollatz :: Integral a => a -> ([a], a)
-remocollatz = fix $ \f x -> if x /= 1
-                            then f =<< mocollatz x
-                            else return x
+remocollatz :: (Num a, Integral b) => b -> ([b], a)
+remocollatz 1 = return 1
+remocollatz x = case M.lookup x (table mutate) of
+    Nothing -> mutate x
+    Just y  -> y
+  where mutate x = remocollatz =<< mocollatz x
+
+table :: (Ord a, Integral a) => (a -> b) -> M.Map a b
+table f = M.fromList [ (x, f x) | x <- [1..size] ]

 -- Trivialities.

-- Then it will take just about 4 seconds with ghc -O2, but I would not live long enough to see it complete without ghc -O2.

Looking at the details of cost centres with ghc -prof -fprof-auto -O2 reveals that the first version enters collatz about a hundred million times, while the patched one -- just about one and a half million times. This must be the reason of the speedup, but I have a hard time understanding the inner workings of this magic. My best idea is that we replace a portion of expensive recursive calls with O(log n) map lookups, but I don't know if it's true and why it depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.

Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?

P.S. There are two requirements to the achieving of automagical memoization highlighted elsewhere on Stack Overflow:

Make a function to be memoized a top-level name.
Make a function to be memoized a monomorphic one.

In line with these requirements, I rebuilt remocollatz as follows:

remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = mutate x

mutate :: Int -> ([Int], Int)
mutate x = remocollatz =<< mocollatz x

Now it's as top level and as monomorphic as it gets. Running time is about 11 seconds, versus the similarly monomorphized table version:

remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = case M.lookup x (table mutate) of
    Nothing -> mutate x
    Just y  -> y

mutate :: Int -> ([Int], Int)
mutate = \x -> remocollatz =<< mocollatz x

table :: (Int -> ([Int], Int)) -> M.Map Int ([Int], Int)
table f = M.fromList [ (x, f x) | x <- [1..size] ]

-- Running in less than 4 seconds.

I wonder why the memoization ghc is supposedly performing in the first case here is almost 3 times slower than my dumb table.

I'm sorry if this post is long and poorly written. I will be working on my style. — Ignat Insarov, Jan 16 '18 at 16:51
It appears your issue is not just with understanding Haskell or the importance of optimizations but also the concept of memoization (a broader programming concept) is that right? Have you looked at general CS or wikipedia/books sources for memoization explanations? — Thomas M. DuBuisson, Jan 16 '18 at 16:53
I asked a question regarding memoization in Haskell, maybe that answer can give some guidance to you as well. https://stackoverflow.com/questions/11473130/memoization-pascals-triangle — Viktor Mellgren, Jan 16 '18 at 16:53
Yw can haz explanation, bt plz att c̶h̶e̶e̶z̶b̶u̶r̶g̶e̶r̶ type signatures to all your top-level functions. — leftaroundabout, Jan 16 '18 at 16:55
@leftaroundabout I added type signatures. Though I suspect this may somewhat alter the intricacies of execution, the running times for the first and the patched versions with `ghc -O2` seem in line with the numbers before the edit. — Ignat Insarov, Jan 16 '18 at 17:31
@ThomasM.DuBuisson I kind of understand the idea of not computing a function twice. But the only case at my hands is the Haskell case, and it appears rather convoluted. Sometimes a function is magically memoized by itself (the ubiquitous `fib` function), other times it takes some major slicing of the code (as with Collatz sequences here), yet in still odder cases I can't manage to reach a performance improvement via memoization at all. — Ignat Insarov, Jan 16 '18 at 17:42
@Kindaro You are wrapping a lot into one question, Focus on one question and not the whole can of worms. Optimizations matter. Period. I'd ask about that separately. Memoized fibs is easy because you always know the input, `n`, will be every number less than the current call. Memoized collatz uses a map (typically) instead of a list because the iterated calls to collatz are not simply `n-1` and `n-2` but a form that is rather unpredictable. Finally, refrain from thinking, or even saying, fibs memoization is magic. If the operations are unclear then understand that first. — Thomas M. DuBuisson, Jan 16 '18 at 17:54

score 6 · Answer 1 · answered Jan 16 '18 at 18:12

6

Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?

Disclaimer: this is a guess, not verified by viewing GHC core output. A careful answer would do so to verify the conjectures outlined below. You can try peering through it yourself: add -ddump-simpl to your compilation line and you will get copious output detailing exactly what GHC has done to your code.

You write:

remocollatz x = {- ... -} table mutate {- ... -}
  where mutate x = remocollatz =<< mocollatz x

The expression table mutate in fact does not depend on x; but it appears on the right-hand side of an equation that takes x as an argument. Consequently, without optimizations, this table is recomputed each time remocollatz is called (presumably even from inside the computation of table mutate).

With optimizations, GHC notices that table mutate does not depend on x, and floats it to its own definition, effectively producing:

fresh_variable_name = table mutate
  where mutate x = remocollatz =<< mocollatz x

remocollatz x = case M.lookup x fresh_variable_name of
    {- ... -}

The table is therefore computed just once for the entire program run.

don't know why it [the performance] depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.

Sorry, but Haskell doesn't work that way. The language definition tells clearly what the meaning of a given Haskell term is, but does not say anything about the runtime or memory performance needed to compute that meaning.

answered Jan 16 '18 at 18:12

Daniel Wagner

145,880
9
220
380

Thank you, Daniel! It is always encouraging to read your answers. If I may ask you to read the postscriptum I just added to my question and comment on it? – Ignat Insarov Jan 16 '18 at 18:36
There is no need to stare at Core to verify your theory. I made modifications as you outlined, and the build without `-O2` runs in about 7 seconds now, versus indefinite lengths of time without these modifications. This proves you right. – Ignat Insarov Jan 16 '18 at 18:51
@Kindaro Do you have a specific question related to your postscript? – Daniel Wagner Jan 16 '18 at 19:09
Yes! Why is implicit memoization `ghc` is supposed to perform so much slower than my dumb table? – Ignat Insarov Jan 16 '18 at 19:11
I kind of arrived to understanding. It is in that there is *no* implicit memoization of things that have arity. – Ignat Insarov Jan 16 '18 at 19:31
4

@Kindaro GHC has no implicit memoization, period. – Daniel Wagner Jan 16 '18 at 19:57
Well, I suppose I should have said that it incrementally evaluates top level constants. It sorta looks like memoization and quacks like memoization. – Ignat Insarov Nov 10 '20 at 21:20
@IgnatInsarov Certainly memoization is available. But it is *explicit* memoization, introduced by a `let` or `where` (or the implicit module-level `where` that precedes all equations) and whose body is a function. – Daniel Wagner Nov 10 '20 at 23:22

score 2 · Answer 2 · answered Jan 17 '18 at 03:14

Another approach to memoization that works in some situations, like this one, is to use a boxed vector, whose elements are computed lazily. The function used to initialize each element can use other elements of the vector in its calculation. As long as the evaluation of an element of the vector doesn't loop and refer to itself, just the elements it recursively depends on will be evaluated. Once evaluated, an element is effectively memoized, and this has the further benefit that elements of the vector that are never referenced are never evaluated.

The Collatz sequence is a nearly ideal application for this technique, but there is one complication. The next Collatz value(s) in sequence from a value under the limit may be outside the limit, which would cause a range error when indexing the vector. I solved this by just iterating through the sequence until back under the limit and counting the steps to do so.

The following program takes 0.77 seconds to run unoptimized and 0.30 when optimized:

import qualified Data.Vector as V

limit = 10 ^ 6 :: Int

-- The Collatz function, which given a value returns the next in the sequence.

nextCollatz val
  | odd val = 3 * val + 1
  | otherwise = val `div` 2

-- Given a value, return the next Collatz value in the sequence that is less
-- than the limit and the number of steps to get there. For example, the
-- sequence starting at 13 is: [13, 40, 20, 10, 5, 16, 8, 4, 2, 1], so if
-- limit is 100, then (nextCollatzWithinLimit 13) is (40, 1), but if limit is
-- 15, then (nextCollatzWithinLimit 13) is (10, 3).

nextCollatzWithinLimit val = (firstInRange, stepsToFirstInRange)
  where
    firstInRange = head rest
    stepsToFirstInRange = 1 + (length biggerThanLimit)
    (biggerThanLimit, rest) = span (>= limit) (tail collatzSeqStartingWithVal)
    collatzSeqStartingWithVal = iterate nextCollatz val

-- A boxed vector holding Collatz length for each index. The collatzFn used
-- to generate the value for each element refers back to other elements of
-- this vector, but since the vector elements are only evaluated as needed and
-- there aren't any loops in the Collatz sequences, the values are calculated
-- only as needed.

collatzVec :: V.Vector Int
collatzVec = V.generate limit collatzFn
  where
    collatzFn :: Int -> Int
    collatzFn index
      | index <= 1 = 1
      | otherwise = (collatzVec V.! nextWithinLimit) + stepsToGetThere
      where
        (nextWithinLimit, stepsToGetThere) = nextCollatzWithinLimit index

main :: IO ()
main = do

  -- Use a fold through the vector to find the longest Collatz sequence under
  -- the limit, and keep track of both the maximum length and the initial
  -- value of the sequence, which is the index.

  let (maxLength, maxIndex) = V.ifoldl' accMaxLen (0, 0) collatzVec
      accMaxLen acc@(accMaxLen, accMaxIndex) index currLen
        | currLen <= accMaxLen = acc
        | otherwise = (currLen, index)
  putStrLn $ "Max Collatz length below " ++ show limit ++ " is "
             ++ show maxLength ++ " at index " ++ show maxIndex

How do I memoize?

2 Answers2