2

I'm going to implement a function which takes in an Integer and outputs a lazy, infinite list of coprime Integers.

coprime 2 = [1,3..]
coprime 3 = [1,2,4,5,7,8,....]

I expect that these lists will be accessed multiple times, so I want their values stored as CAFs. (Forgive me if I'm using that term incorrectly) Okay, so that's fine because that will be done automatically by the compiler (I think). But what about distinct inputs which are guaranteed to have the same output? For instance:

coprime 2 = [1,3..]
coprime 4 = [1,3..]

How can I make sure that these two patterns call upon the same CAF so that the result doesn't have to be recomputed?


My first thought is to try implementing coprime so that coprime 2 and coprime 4 actually call a local function with the same argument. A minimal example:

coprime n
  | n == 4    = realcoprime 2
  | otherwise = realcoprime n
    where
      realcoprime n = .....

But do all calls to coprime share the same CAF values associated with realcoprime? And if so, won't they be garbage collected prematurely because they're not in the global scope?

Myridium
  • 789
  • 10
  • 20

2 Answers2

2

Functions in general cannot be memoized (at least in general they can't). If you expect Haskell to memoize functions, that means that it would have to keep at run-time a dictionary of inputs already seen and their corresponding outputs, but then a function call involves a lookup in this dictionary!

However, there are libraries which provide this facility: memoize and data-memocombinators for example. Using the latter of these, for example, I can memoize your example with

import qualified Data.MemoCombinators as Memo

coprime :: Integer -> [Integer]
coprime n = Memo.integral coprime'
  where
    coprime' = -- your definition here - can even use `coprime`

Then as to your second question, now it is appropriate to make a second function that ends up calling the first.

Community
  • 1
  • 1
Alec
  • 31,829
  • 7
  • 67
  • 114
  • I've tested this, and it will memoize the inputs to `coprime`. Like I said, I need to perform simplification on these arguments before memoizing them, since in general many different inputs will be associated with the same output. – Myridium Sep 20 '16 at 22:38
  • @Myridium yes, read the last sentence of my answer. I suggested that now you could make another function that simplifies the input then calls `coprime`. – Alec Sep 21 '16 at 04:20
  • Yes, I did that. As described in my answer, that does not work. I did some tests and it appears to memoize the inputs to `coprime`, not to `coprime'` which are the simplified inputs. – Myridium Sep 21 '16 at 04:50
  • I am so confused. Your last iteration is exactly what I was recommending you do. Nonetheless, as long as you have got what you wanted, everything is fine! – Alec Sep 21 '16 at 05:10
  • Ah okay, I see now. That was unclear to me, sorry. I thought you were suggesting the first iteration of my answer. Also, are you sure it is `Memo.list Memo.integral`, as I've read elsewhere that we should use the `Memo` of the *input type*, which in this case means `Memo.integral`. – Myridium Sep 21 '16 at 05:28
  • @Myridium Nice catch - it is indeed the type of the input. I've updated my answer in case anyone stumbles on it in the future. I guess it also makes sense to be concerned mainly with the input type as that is the one over which we will need to be doing lookups. – Alec Sep 21 '16 at 05:30
0

I have withdrawn my 'accepted' status from @Alec's answer because I have discovered that the method shown above will only memoize inputs to the coprime function. As stated in the question, I need to do some simplification on the input argument before memoizing it, as otherwise I will be storing the exact same data for multiple arguments. So here was my first attempt at working around this:

cprimes :: [Integer] -> [Integer]
cprimes relnums = cprime' (mySimplify relnums)
  where
    mySimplify relnums = ....
    cprimes' = (Memo.list Memo.integral) cprimes''
    cprimes'' relnums = ....

However, it seems that memoization here does not take place in the global context at least. Results are recomputed when cprimes is called with the same arguments. I suspect the result is being memoized within the one call to cprimes and then garbage collected, because cprimes' is defined within the where block.

This was my second attempt at a workaround:

cprimes :: [Integer] -> [Integer]
cprimes relnums       = cprimetables (mySimplify relnums)
cprimetables = (Memo.list Memo.integral) cprimes'
  where
    mySimplify relnums = ...
    cprimes'   relnums = ...

Now this works as expected. However, it exposes the cprimestable function to anything that imports this module, which is not something I want. It's an internal implementation, and not meant to be called by the outside world. This is why I tried to put it in a where block in the first place.


So in the end, I used this second method in combination with explicitly stating the exports from my module. It goes like this:

module PrimeStuff ( cprimes,
                    ...
                  )
  where

 ...

cprimes :: [Integer] -> [Integer]
cprimes relnums       = cprimetables (mySimplify relnums)
cprimetables = (Memo.list Memo.integral) cprimes'
  where
    mySimplify relnums = ...
    cprimes'   relnums = ...

This way, it seems as though the memoized results in cprimestable are not garbage collected prematurely. It also hides this implementation so that anything importing this module is only aware of the function cprimes which will take as an argument any list of integers.


Helpful resources from stackoverflow.com in finding out how to do this were this for learning about memoization, and this for learning about 'implementation-hiding'.

Community
  • 1
  • 1
Myridium
  • 789
  • 10
  • 20
  • there might still be a way to do what you originally wanted: `coprimes = (results !!) where results = map f [0..] ; f n | n==simplify n = realCoprimes n | otherwise = results !! (simplify n)`. `results`, being referenced inside the returned CAF, should get memoized between calls. then `coprimes 4` will call `f 4` which will call `f 2`, store the result at `xs !! 2`, store the same result at index 4, and return the result (calculating it once). that's the idea anyway. You may have to give it a specific (monomorphic) type signature, like `Int -> [Int]`. A polymorphic constant is not memoized. – Will Ness Sep 21 '16 at 19:43
  • e.g. try `~> let fib :: Int -> Integer ; fib = (\n -> (xs!!(n-1)) + (xs!!(n-2))) where xs = 0:1:map fib [2..]`, then run `length $ show $ fib 10000` twice. [see also](https://stackoverflow.com/questions/11466284/how-is-this-fibonacci-function-memoized/11466959#11466959). – Will Ness Sep 21 '16 at 19:46
  • so the fix for you may be as simple as turning your `cprimes relnums = cprime' (mySimplify relnums)` into `cprimes = \relnums -> cprime' (mySimplify relnums)`. – Will Ness Sep 21 '16 at 19:58
  • @WillNess - Unfortunately, it appears as though your suggestion does not work. Trying the arguments `[2..1000]` reveals that all the internal methods associated with `cprimes` are being called `999` times. i.e. expensive functions are being called `999` times. This is in contrast to the method I outlined in my answer where profiling reveals that each of these internal methods is called only `607` times (the expected number). – Myridium Sep 22 '16 at 06:56
  • @WillNess ...and yet, your method results in a noticeably shorter computation time. This is strange. I don't know how to reconcile that. Do you know why the .prof file might still be saying there are `999` entries? – Myridium Sep 22 '16 at 06:59
  • @WillNess - More investigation reveals that my method has a somewhat shorter computation time, but the garbage collection time more than offsets that. (Ends up using 5GB of RAM...) I find this really surprising, as I fully expected it to be worth the trade-off. Alas, a mathematical result shows that `6/pi^2 = 0.608` of the total number of computations will still be required, even with memoisation. I guess it's just not worth it for the garbage collection. Thank you anyhow for your suggestion. – Myridium Sep 22 '16 at 07:09
  • `realCoprimes` can only be called when `n==simplify n`. `coprimes 4 == results !! 4 == f 4 == results !! 2 == f 2 == realCoprimes 2`. There will be of course 999 entries in the *list*. I just tried my code in GHCi, with `simplify = product . map fst . factorise` and `let realCoprimes n = trace (show n++",") [i | i<- [1..], gcd n i == 1]`, and the memoization between calls *was* happening for me (i.e. trace outputs seen only for squarefree numbers) – Will Ness Sep 22 '16 at 11:31
  • @WillNess - Indeed, the argument is simplified to a square-free number, but this does not mean that memoisation is working correctly. Try compiling and running [this script](http://pastebin.com/gZZ6uAU0) with a small input of about 10. Maybe I'm mistaken, but it seems like expensive operations are being performed not only for square-frees. – Myridium Sep 23 '16 at 01:12
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/123996/discussion-between-myridium-and-will-ness). – Myridium Sep 23 '16 at 01:32