How to reliably compare runtime of Haskell and C?

Question

I used Criterion library to write benchmarks for my Haskell functions. Now I am implementing the same algorithm in C to compare performance with Haskell. The question is how can I do it reliably? Criterion does a lot of fancy stuff like accounting for clock call overhead and doing statistical analysis of the results. I guess that if I just measure time needed by my C function it will not be comparable with the results returned by Criterion. In his original post about Criterion Bryan O'Sullivan writes: "It should even be easy to use criterion to benchmark C code and command line programs." The question is how? Takayuki Muranushi compares C implementation of DFT with Haskell by spawning threads and calling the executable but I fear that this adds a lot of additional overhead (create new thread, run the application, output to stdio and then reading from it) which makes the results incomparable. I also considered using FFI, but again I fear that additional overhead would make such comparison unfair.

If there is no way of using Criterion to reliably benchmark C, then what approaches to C benchmarking would you recommend? I've read some questions here on SO and it seems that there are many different functions that allow to measure system time, but they either provide time in milliseconds or have large call overhead.

I'm not convinced that using the FFI is inappropriate here. I think that it is the method with the lowest possible overhead. If you mark your C import as `unsafe`, it will be just marshalling and a simple inline `call` instruction. — Mikhail Glushenkov, Oct 22 '12 at 10:57
Additionally, you can estimate the FFI/`exec` overhead by benchmarking a C function that does nothing and see how much noise it adds. — Mikhail Glushenkov, Oct 22 '12 at 11:03
Actually I don't know much about how FFI works so perhaps I am overestimating the overhead, but what about copying data to and from external function? — Jan Stolarek, Oct 22 '12 at 11:05
It depends on what kind of data you're working with. E.g. if it is a `ByteString`, you can use [`unsafeUseAsCString`](http://hackage.haskell.org/packages/archive/bytestring/0.10.2.0/doc/html/Data-ByteString-Unsafe.html#g:3). Or just do marshalling before the measurement. — Mikhail Glushenkov, Oct 22 '12 at 11:25

Mikhail Glushenkov · Accepted Answer · 2012-10-22T14:38:32.147

FFI can be used in such a way that it doesn't add much overhead. Consider the following program (full code available here):

foreign import ccall unsafe "mean" c_mean :: Ptr CInt -> CUInt -> IO CFloat

main :: IO ()
main = do
  buf <- mallocBytes (bufSize * sizeOfCInt)
  fillBuffer buf 0
  m <- c_mean buf (fromIntegral bufSize)
  print $ realToFrac m

The C call is compiled to the following Cmm:

s2ni_ret() { ... }
    c2qy:
        Hp = Hp + 12;
        if (Hp > I32[BaseReg + 92]) goto c2qC;
        _c2qD::I32 = I32[Sp + 4];
        (_s2m3::F32,) = foreign "ccall"
          mean((_c2qD::I32, PtrHint), (100,));

Here's the assembly:

s2ni_info:
.Lc2qy:
        addl $12,%edi
        cmpl 92(%ebx),%edi
        ja .Lc2qC
        movl 4(%ebp),%eax
        subl $4,%esp
        pushl $100
        pushl %eax
        ffree %st(0) ;ffree %st(1) ;ffree %st(2) ;ffree %st(3)
        ffree %st(4) ;ffree %st(5)
        call mean

So, if you mark your C import as unsafe and do all marshalling before measurement, your C call will be basically just an inline call instruction - the same as if you were doing all benchmarking in C. Here's what Criterion reports when I benchmark a C function that does nothing:

benchmarking c_nothing
mean: 13.99036 ns, lb 13.65144 ns, ub 14.62640 ns, ci 0.950
std dev: 2.306218 ns, lb 1.406215 ns, ub 3.541156 ns, ci 0.950
found 10 outliers among 100 samples (10.0%)
  9 (9.0%) high severe
variance introduced by outliers: 91.513%
variance is severely inflated by outliers

This is approximately 400 times smaller than the estimated clock resolution on my machine (~ 5.5 us). For comparison, here's the benchmark data for a function that computes the arithmetic mean of 100 integers:

benchmarking c_mean
mean: 184.1270 ns, lb 183.5749 ns, ub 185.0947 ns, ci 0.950
std dev: 3.651747 ns, lb 2.430552 ns, ub 5.885120 ns, ci 0.950
found 6 outliers among 100 samples (6.0%)
  5 (5.0%) high severe
variance introduced by outliers: 12.329%
variance is moderately inflated by outliers

Thanks for detailed answer. I am not exactly sure if this will work for my, since I return an array of doubles and will have to convert it to some Haskell container (probably an unboxed Vector) — Jan Stolarek, Oct 22 '12 at 15:13
`Data.Vector.Storable` contains the function [`unsafeToForeignPtr`](http://hackage.haskell.org/packages/archive/vector/0.10.0.1/doc/html/Data-Vector-Storable.html#v:unsafeToForeignPtr) that lets you modify the vector in-place from the C code. — Mikhail Glushenkov, Oct 22 '12 at 15:26
@MikhailGlushenkov, that documentation says: "The data may not be modified through the ForeignPtr." — huon, Oct 22 '12 at 16:26
@dbaupp I should have linked to [`Data.Vector.Storable.Mutable`](http://hackage.haskell.org/packages/archive/vector/0.10.0.1/doc/html/Data-Vector-Storable-Mutable.html#v:unsafeToForeignPtr0). — Mikhail Glushenkov, Oct 22 '12 at 16:37

How to reliably compare runtime of Haskell and C?

1 Answers1