I want to challenge GHC compiler, so I wrote this code (the detail of the code is actually unimportant, only to show that some hard work has to be done to get each element of this infinite list):
hardwork :: [Int]
hardwork = step1 where
-- start with value 1
step1 = step2 1
-- force the hard work being done
step2 i = step3 i $! step4 i
-- construct the current result, start next
step3 i result = result : step2 (i+1)
-- start the work with n=0 and j=1
step4 i = step5 i 0 1
-- if n=1048576, finish work and return j
step5 _ 1048576 j = j
-- otherwise increase n, and modify j
step5 i n j = step5 i (n+1) ((j * i) `mod` 1048575)
Now I use the cleave
function described in the Haskellwiki
cleave :: [a] -> ([a],[a])
cleave = step1 where
step1 xs = (odds xs, evens xs)
odds [] = []
odds [x] = [x]
odds (x:_:xs) = x : odds xs
evens [] = []
evens [x] = []
evens (_:x:xs) = x : evens xs
and the main function
main :: IO ()
main = do
print (take 5 (fst $ cleave hardwork), take 4 (snd $ cleave hardwork))
Like expected, it prints out values slowly, since it has to do very hard work to get the results. However what surprising me is, once the fist list being printed out, the second list was calculated immediately.
This is a surprise because, since the two occurrence of cleave hardwork
seem to be unrelated in the code, and we are accessing the different part of them, it looks like a naive implementation will do the hard work again to get the second list. However GHC seems to be cleverer than I think.
My question is: how did they manage to do so? What is the magic behind this? More precisely, how the runtime figure out some requested value has already been evaluated (even if they were never being accessed)? Are there any cost for this kind of bookkeeping?
BTW, to ensure I am doing the right thing in the right way, I used a un-sugared, step by step style to define hardwork
. There are of cause other ways to implement it but if it uses any sugars the behaviour may depend on the detail how the code being de-sugared by the compiler. Also, this step-by-step style makes paper evaluation by manually substitute expressions easier.
EDIT
So according to the answers, I rewrote hardwork
to make it not a CAF (this is a more generic way to do so than the answer suggested anyway):
hardwork :: a -> [Int]
hardwork = step1 where
step1 _ = step2 1
...
Now it results in the main
running slow on both part of the result. But if I replace main
with
print (take 5 $ fst value, take 6 $ snd value) where value = cleave hardwork()
It works the same way the first version do. So it looks like a prove of what the accepted answer said.