How does laziness affect benchmarking in Haskell?

Question

This question is related to the following question : How to force evaluation in Haskell?

I want to benchmark the algorithm quicksort for a list. For this I have made a certain number of files which have random numbers in them.

Here is the relevant part of the code in question :

import System.IO
import Data.Time
import Control.DeepSeq

getListFromFiles :: IO [[Int]]
quicksort :: (Ord a) => [a] -> [a]

main = do
  l <- getListFromFiles
  start <- getCurrentTime
  let l' = map quicksort l
  end <- l' `deepseq` getCurrentTime
  print (diffUTCTime end start)

I do not want to know want to measure the time the program takes to look into the files, just the one that the sorting takes. Because of laziness, I think that the list l is only evaluated when deepseq is called on the list l' and that gives a flawed benchmark. Am I correct ?

For timing benchmarks, take a look at the criterion library. In particular, you will likely end up using `nf`. Make sure you start by opening the files and only wrap the `quicksort` call itself with `nf`. — Alec, Oct 04 '17 at 13:46

score 3 · Accepted Answer · answered Oct 04 '17 at 16:59

3

I think that the list l is only evaluated when deepseq is called on the list l'...

Correct.

...and that gives a flawed benchmark.

Let me make an assumption about what you mean by "flawed". I guess what you mean is that getCurrentTime will return a time from before the sort is fully completed. Under that assumption, no, the benchmark is not flawed. I'm not sure I can explain which part of your reasoning is wrong, though, because you don't say why you think the benchmark will be flawed.

However, there is a pitfall to be aware of that I suspect is different from the one you had in mind: you should make sure that the input list is fully evaluated before calling the starting getCurrentTime, thus:

  start <- l `deepseq` getCurrentTime

This may or may not matter, depending on exactly how you implemented getListFromFiles.

answered Oct 04 '17 at 16:59

Daniel Wagner

145,880
9
220
380

And, independently of answering the question as asked, I second @Alec's recommendation to look at [criterion](http://hackage.haskell.org/package/criterion). – Daniel Wagner Oct 04 '17 at 17:02
The pitfall you mention is exactly why I thought my program was flawed. So, if I understand correctly, if I write : "start <- l `deepseq` getCurrentTime" then "let l' = map quicksort l" and then "end <- l' `deepseq` getCurrentTime", will I only measure the time the quicksort algorithm takes, not the time that l is read in the files ? – fyusuf-a Oct 04 '17 at 17:21
@Florian Ah, I misunderstood the question! I'm glad I snuck the answer to your actual question in there by accident. =) – Daniel Wagner Oct 04 '17 at 17:24
@Florian Basically yes. It will measure the time for the `quicksort` plus the `deepseq`, but the latter is sort of unavoidable. – Daniel Wagner Oct 04 '17 at 17:27

How does laziness affect benchmarking in Haskell?

1 Answers1