How to create data for Criterion benchmarks?

Question

I am using criterion to benchmark my Haskell code. I'm doing some heavy computations for which I need random data. I've written my main benchmark file like this:

main :: IO ()
main = newStdGen >>= defaultMain . benchmarks

benchmarks :: RandomGen g => g -> [Benchmark]
benchmarks gen =
   [
     bgroup "Group"
     [
       bench "MyFun" $ nf benchFun (dataFun gen)
     ]
   ]

I keep benchmarks and data genrators for them in different modules:

benchFun :: ([Double], [Double]) -> [Double]
benchFun (ls, sig) = fun ls sig

dataFun :: RandomGen g => g -> ([Double], [Double])
dataFun gen = (take 5 $ randoms gen, take 1024 $ randoms gen)

This works, but I have two concerns. First, is the time needed to generate random data included in the benchmark? I found a question that touches on that subject but honestly speaking I'm unable to apply it to my code. To check whether this happens I wrote an alternative version of my data generator enclosed within IO monad. I placed benchmarks list with main, called the generator, extracted the result with <- and then passed it to the benchmarked function. I saw no difference in performance.

My second concern is related to generating random data. Right now the generator once created is not updated, which leads to generating the same data within a single run. This is not a major problem, but nevertheless it would be nice to make it properly. Is there a neat way to generate different random data within each data* function? "Neat" means "without making data functions acquiring StdGen within IO"?

EDIT: As noted in comment below I don't really care about data randomness. What is important to me is that the time needed to generate the data is not included in the benchmark.

I know this strays from your question, but why would you really need random data when benchmarking? — user1105045, Oct 15 '12 at 13:14
I guess I don't. I'm open to any suggestion on how to generate a huge amount of data for benchmarks. I guess the most important part of my question is: how to generate data so that time needed for that is not included in the benchmark. — Jan Stolarek, Oct 15 '12 at 13:18
I really don't know how criterion measurements are done and neither how you can generate that data at compile time, which would be ideal. But using the same data with all your algorithms you would have a common point of reference and could tell which one is better, wouldn't you? — user1105045, Oct 15 '12 at 13:31
That is a good point, but not all benchmarked functions need the same input data. — Jan Stolarek, Oct 15 '12 at 13:33

jberryman · Accepted Answer · 2012-10-15T19:53:51.747

This works, but I have two concerns. First, is the time needed to generate random data included in the benchmark?

Yes it would. All of the random generation should be happening lazily.

To check whether this happens I wrote an alternative version of my data generator enclosed within IO monad. I placed benchmarks list with main, called the generator, extracted the result with <- and then passed it to the benchmarked function. I saw no difference in performance.

This is expected (if I understand what you mean); the random values from randoms gen aren't going to be generated until they're needed (i.e. inside your benchmark loop).

Is there a neat way to generate different random data within each data* function? "Neat" means "without making data functions acquiring StdGen within IO"?

You need either to be in IO or create an StdGen with an integer seed you supply, with mkStdGen.

Re. your main question of how you should get the pRNG stuff out of your benchmarks, you should be able to evaluate the random input fully before your defaultMain (benchmarks g) stuff, with evaluate and force like:

import Control.DeepSeq(force)
import Control.Exception(evaluate)
myBench g = do randInputEvaled <- evaluate $ force $ dataFun g
               defaultMain [
                    bench "MyFun" $ nf benchFun randInputEvaled
                    ...

where force evaluates its argument to normal form, but this will still happen lazily. So to get it to be evaluated outside of bench we use evaluate to leverage monadic sequencing. You could also do things like call seq on the tail of each of the lists in your tuple, etc. if you wanted to avoid the imports.

That kind of thing should work fine, unless you need to hold a huge amount of test data in memory.

EDIT: this method is also a good idea if you want to get your data from IO, like reading from the disk, and don't want that mixed in to your benchmarks.

But nfIO returns IO () so in your example randInputEvaled contains just (). The result of the IO action passed to nfIO seems to be discarded, unless I misunderstand something? — Jan Stolarek, Oct 15 '12 at 14:18
Sorry! You're absolutely right. Should be better after edits. — jberryman, Oct 15 '12 at 16:35
Thanks. I did a lot of testing today and it seems that criterion automagically handles the problem of laziness and data creation. I used delayThread+unsafePerformIO in my data generating function to slow it down. Criterion was wrong when estimating the time needed to run the benchmark (estimated to about 110s), but the final result was the same as if I didn't use the delay. I noticed that garbage collector turned out to distort the final results (-g flag for the rescue!). — Jan Stolarek, Oct 16 '12 at 13:02

score 0 · Answer 2 · answered Oct 15 '12 at 19:11

You could try reading the random data from a disk file instead. (In fact, if you're on some Unix-like OS, you could even use /dev/urandom.)

However, depending on how much data you need, the I/O time might dwarf the computation time. It depends how much random data you need.

(E.g., if your benchmark reads random numbers and calculates their sum, it's going to be I/O-limited. If your benchmark reads a random number and does some huge calculation based on just that one number, the I/O adds hardly any overhead at all.)

How to create data for Criterion benchmarks?

2 Answers2