5

What's the fastest way to memoize a recursive function in Haskell?

Background: Recently I have been solving Project Euler problems in Haskell. Many require many computations of a recursively defined combinatorial or number-theoretical function, for example the Fibonacci numbers. Performance improves dramatically if such functions are memoized, that is, the results of the function are cached for later use.

I have seen many solutions to this problem. The most elegant seems to be this. One uses Data.IntMap (or hash tables) and the State monad. A tree based solution suggested in this answer, and such solutions seem fairly common. To give another example, see this blog post. I've seen other solutions using built-in functions. There's one in section 2 here with the fix, and further it seems that the compiler can sometimes be massaged into memoizing without additional work. There's also several prebuilt solutions.

I'm wondering which memoization method is fastest in practice for the kinds of functions used in Project Euler. My intuition says the hashtables library is, since hash tables seem to be the preferred dictionary structure in imperative languages. The purely functional tree solutions are cool, but my Googling tells me they're strictly worse than hash tables in terms of asymptotic performance.

Edit

A few comments said this question is too broad to answer, and upon reflection I agree. So let me give two concrete examples of functions to memoize: a function that computes the n'th Fibonacci number recursively, and one that computes the Catalan numbers recursively. I would like to compute these functions many times for large n.

I'm aware there are explicit formulas for these, but let's ignore that, because the real point here is to use them to benchmark memoization techniques.

Community
  • 1
  • 1
Potato
  • 551
  • 6
  • 15
  • 3
    Binary search trees [like `IntMap`](https://github.com/haskell/containers/blob/master/Data/IntMap/Internal.hs#L347-L352) are slower than hash maps in terms of _asymptotic_ performance, yes. Asymptotics aren't everything, though. `IntMap` has a number of other desirable properties, such as avoiding the overhead of hashing, behaving well when used in a persistent manner, etc. For normal usages `IntMap`'s performance is competitive with a hash table. – Benjamin Hodgson Jan 27 '17 at 10:34
  • 4
    It depends on the problem. If arrays are appropriate, for example when you will eventually have to evaluate the function on every value in a range, then they are probably the most efficient choice. – Reid Barton Jan 27 '17 at 15:05
  • As a side note, the last stack overflow answer you are linking to is written by Jon Harrop. [Here](https://www.reddit.com/r/haskell/comments/4ogle3/post_about_the_disadvantages_of_functional/d4clabv/) is a comment @sclv wrote a while back that I think is relevant. – Alec Jan 27 '17 at 19:05
  • This question as it stands is much too broad because the memoization technique really depends on the situation - and there are many. If you edit your question to have a couple examples, you will have a better chance at getting an answer. – Alec Jan 27 '17 at 19:24
  • @Alec I agree that it's too broad. Thanks for pointing that out. I added two concrete examples I'm interested in. Does that improve things? – Potato Jan 27 '17 at 21:30
  • I just want to point out that in general you are better off looking for the mathematical solution in the Project Euler solutions. Sure,.you *can* theoretically speed up your code by letting your code run on a GPU, but is it really *necessary*? IMO it generally isn't. – ThreeFx Jan 27 '17 at 22:53
  • @ThreeFx There are problems on there for which mathematics only helps so much. At some point you need a good algorithm, too, and I've personally found memoization to be very helpful. But I don't know the best way to implement it, hence this question. – Potato Jan 28 '17 at 00:11
  • @ReidBarton Thanks for the input. I've made the question more precise. – Potato Jan 28 '17 at 00:17
  • I voted to close this before it edited. As it stands now, I think it is quite answerable and should be reopened – Alec Jan 28 '17 at 07:14
  • 1
    You may be interested in this nice link with description how to perform lazy dynamic programming and kind of memoization you want: http://jelv.is/blog/Lazy-Dynamic-Programming/ – Shersh Jan 29 '17 at 12:01
  • @Shersh That's really great. Thanks for the pointer! – Potato Jan 29 '17 at 19:43
  • Did you check out `monad-memo` and the performance chart for the fibonacci fn given [here](https://github.com/EduardSergeev/monad-memo)? – liminalisht Jan 29 '17 at 21:34
  • @liminalisht I haven't! That's awesome. Thanks for the pointer. If you wanted to type a summary of that as an answer (I don't quite know enough to do a good job...) I could accept it. – Potato Jan 29 '17 at 21:47
  • @Potato I don't know enough about memoization to write up a good answer unfortunately. – liminalisht Jan 29 '17 at 22:04
  • 1
    The best approach still depends on the situation: do you know up front the values you will need to memoize, are they a range of consecutive integers, what time/space tradeoffs do you want to make. The functions themselves aren't really the important thing. – Reid Barton Jan 31 '17 at 17:58
  • @ReidBarton I'm thinking of a problem like: find the sum of f(n) over all prime n less than some fixed large number (where we choose the number to make the benchmarking times reasonable). I'm not knowledgable enough to have an opinion on time/space tradeoffs. I just want something that runs fast on my laptop. – Potato Jan 31 '17 at 23:05
  • I mean, why don't you write some simple criterion benchmarks and compare the approaches yourself? – jberryman Feb 02 '17 at 03:04

1 Answers1

1

When Trying to find the nth fibonacci number the only numbers that you need to memoize are the two previous numbers. you can do it as a tuple like (f n-1, f n ) and on each loop update this tuple. note that updating the tuple is done through pointer manipulation and is not computationally expensive.

a cleaner and slightly smarter alternative would be:

fibs :: [Integer]
fibs = fibcreator 0 1
  where
    fibcreator a b = a : fibcreator b (a+b)

nth = take n fibs

But one of the best algorithms that I have seen is this:

  1. let's define a matrix m = [e11 = 1,e12 =1,e21 = 1,e22 = 0]
  2. to get nth fibonacci number we compute m' = m ^ (n-1)
  3. e11 element in the matrix m' is the nth fibonacci number

Now what is great here is that in order to get 17 fibonacci number we can do

m' = ((((m^2)^2)^2)^2) * m

This significantly reduces the computaion time and passively embeds the memoization within the algorithm. The punchline is that Haskell already uses this algorithm to compute the power function, so you don't need to implement it. the full implementation is :

data Matrix = Matrix Integer Integer Integer Integer

instance Num Matrix where
  (*) (Matrix a11 a12 a21 a22) (Matrix b11 b12 b21 b22)
   = Matrix (a11*b11 + a12*b21) (a11*b12 + a12*b22) (a21*b11 + a22*b21) (a21*b12 + a22*b22)

fib4 :: Integer -> Integer
fib4 0 = 0
fib4 n = x
  where
    (Matrix x _ _ _) = Matrix 1 1 1 0 ^ (n-1)
milad zahedi
  • 787
  • 6
  • 21
  • 2
    "I'm aware there are explicit formulas for these, but let's ignore that, because the real point here is to use them to benchmark memoization techniques." – Alec Jan 29 '17 at 20:40
  • 2
    The explicit formula for Fibonacci sequence is actually slower to evaluate, because you must evaluate it symbolically rather than numerically. I implemented it once to see, and I think it wound up being O(n^2) or thereabouts. – OmnipotentEntity Feb 05 '17 at 03:23