Interpretation of JMH results and algorithm complexity

Question

I'm trying to prove algorithm complexity by using benchmark data. My algorithm to test is the binary search algorithm (stated complexity is O(log n)) and I want to use JMH library for benchmarking.

Here is test example:

public class BinarySearchTest {

private static SearchAlgorithm binaryIterative = new BinarySearchIterative();
private static SearchAlgorithm binaryRecursive = new BinarySearchRecursive();

@Test
public void runBenchmarks() throws Exception {
    Options options = new OptionsBuilder()
            .include(this.getClass().getName() + ".*")
            .mode(Mode.Throughput)
            .forks(1)
            .threads(1)
            .warmupIterations(0)
            .measurementIterations(1)
            .shouldFailOnError(true)
            .shouldDoGC(true)
            .build();

    new Runner(options).run();
}

@Benchmark
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public void binarySearchIterativeBenchmark(ExecutionPlan plan) {

    //given
    int size = randomPositiveIntLessThan(plan.arraySize);
    int[] array = generateUninterrupted(0, size);
    int target = randomPositiveIntLessThan(size);

    //when
    var result = binaryIterative.find(array, 0, array.length, target);

    //then
    assertTrue(result != -1);
}

This is class with algorithm implementation:

public class BinarySearchIterative implements SearchAlgorithm {

@Override
public int find(int[] array, int start, int end, int target) {

    if (end > array.length) {
        return -1;
    }

    int left = start;
    int right = end;

    while (left <= right) {
        int median = left + (right - left) / 2;
        if (array[median] == target) {
            return median;
        }
        if (array[median] > target) {
            right = median - 1;
        }
        if (array[median] < target) {
            left = median + 1;
        }
    }
    return -1;
}

I use a class annotated with @State to get size for arrays:

@State(Scope.Benchmark)
public class ExecutionPlan {
    @Param({"100000", "200000", "300000", "400000", "500000",
            "1000000", "2000000", "3000000", "4000000", "5000000",
           "10000000", "20000000", "30000000", "40000000", "50000000"})
    public int arraySize;

So I have next results:

BinarySearchTest.binarySearchIterativeBenchmark 100000 thrpt
31.602 ops/ms BinarySearchTest.binarySearchIterativeBenchmark 200000 thrpt 14.520 ops/ms BinarySearchTest.binarySearchIterativeBenchmark 300000 thrpt
9.004 ops/ms BinarySearchTest.binarySearchIterativeBenchmark 400000 thrpt 6.896 ops/ms BinarySearchTest.binarySearchIterativeBenchmark 500000 thrpt
5.333 ops/ms BinarySearchTest.binarySearchIterativeBenchmark 1000000 thrpt 2.304 ops/ms BinarySearchTest.binarySearchIterativeBenchmark 2000000 thrpt
0.790 ops/ms BinarySearchTest.binarySearchIterativeBenchmark 3000000 thrpt 0.451 ops/ms BinarySearchTest.binarySearchIterativeBenchmark 4000000 thrpt
0.330 ops/ms BinarySearchTest.binarySearchIterativeBenchmark 5000000 thrpt 0.232 ops/ms BinarySearchTest.binarySearchIterativeBenchmark 10000000 thrpt
0.135 ops/ms BinarySearchTest.binarySearchIterativeBenchmark 20000000 thrpt 0.061 ops/ms BinarySearchTest.binarySearchIterativeBenchmark 30000000 thrpt
0.039 ops/ms BinarySearchTest.binarySearchIterativeBenchmark 40000000 thrpt 0.033 ops/ms BinarySearchTest.binarySearchIterativeBenchmark 50000000 thrpt
0.025 ops/ms

But if I plot the graph score/arraysize I get not log(n) but rather 1/x graph. If I use Mode.AverageTime the graph is rather x^2.

Here is my graph for data provide above, y[ms/ops], x[arraysize]:

How can I get Operation units from JMH or tune my test?

Can you kindly share the entire code for your iterative binary search and the graph you plotted? — Deepak Tatyaji Ahire, Feb 08 '21 at 17:09
@Spektre Yes, `ops/ms == operations / ms`. In this case, an operation is the whole `binarySearchIterativeBenchmark` method. It is executed multiple times for each of the parameters, size of an array in this case. — Amongalen, Feb 09 '21 at 07:22
@Deepak Tatyaji Ahire, I have added the algorithm implementation and graph to the original post — Andrey, Feb 09 '21 at 07:38
@Spektre, array is sorted and it is ascending, it is generatedthis way: `return IntStream.rangeClosed(start, end).toArray();` — Andrey, Feb 09 '21 at 09:52
@Andrey see edit1 especially the end ... You need to generate the array outside benchmark measurements — Spektre, Feb 09 '21 at 10:30

Spektre · Answer 1 · 2021-02-09T10:32:59.407

You are plotting and comparing wrong stuff.

so you got operations per 1ms ops_ms which is more or less measured time t in [ms] divided by number of operations m. For binary search of size n is:

m = ~log2(n)

To obtain complexity and or correct plot you need to plot measured time t versus size n however you are plotting ops_ms versus n...

So first we need to obtain the measured time t (top is time of single operation):

t = m*top
m = log2(n)
ops_ms = 1/top
--------------
top=1/ops_ms
t = log2(n)*top
--------------
t = log2(n)/ops_ms

so you need to plot t as y axis and n as x axis. However as you can see this way of measuring is worthless as you need to know what you measure in order to get m and even that is just an approximate... what is much better/accurate is using measured time directly as your ops/ms is messing all up.

When I use this measuring complexity on data like this:

const double l2=3.3219280948873623478703194294894;
double binsearch[]= // n[-],t[ms]
    {
      100000,l2*log(  100000.0)/31.602,
      200000,l2*log(  200000.0)/14.520,
      300000,l2*log(  300000.0)/ 9.004,
      400000,l2*log(  400000.0)/ 6.896,
      500000,l2*log(  500000.0)/ 5.333,
     1000000,l2*log( 1000000.0)/ 2.304,
     2000000,l2*log( 2000000.0)/ 0.790,
     3000000,l2*log( 3000000.0)/ 0.451,
     4000000,l2*log( 4000000.0)/ 0.330,
     5000000,l2*log( 5000000.0)/ 0.232,
    10000000,l2*log(10000000.0)/ 0.135,
    20000000,l2*log(20000000.0)/ 0.061,
    30000000,l2*log(30000000.0)/ 0.039,
    40000000,l2*log(40000000.0)/ 0.033,
    50000000,l2*log(50000000.0)/ 0.025,
      0,0.000
    };

it leads to this result:

binsearch O(n.log^4(n)) error = 0.398668

which is still too far of the expected log2(n) however much closer than other options. That implies additional stuff is balasting your ops/ms values which I expected ... You know you got JRE architecture and also the host architecture messing measurements with stuff like CACHE, prefetch pipeline-ing, etc and on top of all that your JMH might do stuff too (like averaging or "enhancing" the ops/ms value for some purpose) ...

In case ops_ms is actually binsearch/ms as one of the comments suggest then the time is computed by 1/ops_ms instead which might be true as the result is sligtly closer to O(log(n)) but still too far off:

//   time              O(n)          uncertainity
log2(n)/ops_ms    O(n.log^4(n))    error = 0.398668 // m ops / ms
    (n)/ops_ms    O(n^2.log^3(n))  error = 0.398668 // n ops / ms
    (1)/ops_ms    O(n.log^3(n))    error = 0.398668 // binsearch / ms

So my advice is to find a way to measure time directly instead of using ops/ms...

[edit1] my C++ implementation I tested on

int find(int *array,int size,int start,int end,int target)
    {
    if (end >= size) return -1;
    int left = start;
    int right = end;
    while (left <= right)
        {
        int median = left + (right - left) / 2;
        if (array[median] == target) return median;
        if (array[median] > target)  right = median - 1;
        if (array[median] < target)  left = median + 1;
        }
    return -1;
    }

usage:

const int n=50000000;
double binsearch[]= // n[-],t[ms]
    {
      100000,1.0,
      200000,1.0,
      300000,1.0,
      400000,1.0,
      500000,1.0,
     1000000,1.0,
     2000000,1.0,
     3000000,1.0,
     4000000,1.0,
     5000000,1.0,
    10000000,1.0,
    20000000,1.0,
    30000000,1.0,
    40000000,1.0,
    50000000,1.0,
      0,0.000
    };
int *dat=new int[n],i,s;
Randomize();
for (s=0,i=0;i<n;i++)
    {
    s+=1+Random(10);
    dat[i]=s;
    }
for (i=0;binsearch[i];)
    {
    s=binsearch[i]; i++;
    tbeg(); // star measuring of time
    find(dat,s,0,s-1,dat[Random(s)]);
    tend(); // end measuring of time
    binsearch[i]=performance_tms; i++; // store measured time
    }
delete[] dat;

This will generate PRNG int ascending array and test your find on it my bet is your data is not random at all as I described in my comment. When I apply this the result is as expected:

binsearch O(log(n)) error = 0.528393

So either your array and or target is not chosen correctly or even your measurement of time includes its generation which mess things up.

If I see it right your array generation is either O(n^2) or O(n.log(n)) opposing mine O(n) so if it would be included in measurements it would dominate the O(log(n)) ... the result:

(1)/ops_ms    O(n.log^3(n))    error = 0.398668 // binsearch / ms

suggest its the case and used generation is around O(n.log(n)) the log power discrepancy is just due to used computation architecture and time measurements imprecisions...

If comments on the question are right, an "op" is a whole binary search, not a single step. So `ops/ms` is basically just `1/t`, the inverse of ms/op. If their claim is right, ops/ms isn't also involving something else that scales with `log2(n)`. — Peter Cordes, Feb 09 '21 at 09:19
Just realized that wasn't the OP's comment, but yeah it looks right given that there's no timing / instrumentation inside the search loop. The actual plot and text for it in the question also says they're plotting ms/op (time), not ops/ms, which should actually be right. — Peter Cordes, Feb 09 '21 at 09:23
The actual graph is using a linear scale for y, and arbitrary step sizes for x. Like piecewise linear with slope increasing by a factor of 10 every 5 steps. You'd expect actual log(n) results to be closer to linear overall, but there are probably many confounding real-world performance effects (like cache and TLB sizes and branch hit rate). — Peter Cordes, Feb 09 '21 at 09:29
@PeterCordes yes I mentioned it in answer however for big values it should not matter as much... I rewrited the ending to accomodate the binsearch/ms it looks like its closer ... but still the difference to expectation is huge ... (unless I have some bug in my complexity estimation code) — Spektre, Feb 09 '21 at 09:32
@PeterCordes hmm I tested the same `find` implementation ported to C++ and the result is `binsearch O(log(n)) error = 0.528393` so the complexity analysis code looks OK ... so my bet is that the `ops_ms` is not what we think or the measurement includes also the generation of array ... or the binsearch parameters are not chosed correctly for estimating the speed — Spektre, Feb 09 '21 at 10:05
Note that binary search performance is *very* sensitive to load-use latency, which varies from 4 cycles for an L1d$ hit, up to ~300 cycles depending on clock speed for a cache miss all the way to DRAM. (But confounded by the fact that this is latency to detect branch mispredict, unless code-gen ends up using a data dependency. Speculative exec acts as a sort of prefetch for a branchy binary search. IDK how randomized the "key" values are, to not share the same early path) So it's very plausible that the simple log2(n) model is just plain too inaccurate, and that these measurements are fine. — Peter Cordes, Feb 09 '21 at 10:07
@PeterCordes I just found that OPs array minght not be random at all ... — Spektre, Feb 09 '21 at 10:10
What do you mean, in what way is that problematic? The array itself has to be sorted for binary search to work. Even using `arr[i] = i` wouldn't actually help this code run faster than with random monotonic. If there were lots of duplicate elements, you'd tend to hit an exact match on average sooner. — Peter Cordes, Feb 09 '21 at 10:13
@PeterCordes yes it can be a problem as I fear the benchmark is measuring the generation of the array which is most likely `O(n^2)` or `O(n.log(n))` in this case of generation ...has nothing to do with the `find` itself — Spektre, Feb 09 '21 at 10:18
Oh, yes I was wondering how `int[] array = generateUninterrupted(0, size);` inside the timed function could be cheap enough not to dominate the total time (I guessed it might be getting a reference to an existing array or something). But now that you mention it, yeah I should have realized they're *not* solving that problem, and that explains everything :P The problem isn't the array being non-random, though, it's that they re-generate it during the timed "operation". — Peter Cordes, Feb 09 '21 at 10:21
@PeterCordes exactly ... didn't see it before too as I was thinking the `ops_ms` was not what OP expects but this explains all even the estimated complexity... — Spektre, Feb 09 '21 at 10:22
thanks for the lead: I have moved array generation from benchmark outside to the class, annotated with `@State`. It increased amount of ops/ms, but the trend of the graph is pretty the same. I used to generate continuous array of ints mark random int inside it as target. — Andrey, Feb 09 '21 at 15:45
@Andrey that is strange ... your `find` is definately `O(log(n))` on mine C++ setup so the problem is elsewhere. Either the measured value means something different or you are still measuring something else along your binary search or You hit some entirely different problem I am not aware of. — Spektre, Feb 09 '21 at 17:57

score 1 · Answer 2 · answered Feb 10 '21 at 13:04

1

I think I've found the reasons for such behaviour. Here are my fixes:

Changed benchmark mode to Mode.AverageTime so now benchmark outputs average time in ms/op.
Switched to nanoseconds @OutputTimeUnit(TimeUnit.NANOSECONDS).
Added 1 Warmup Iteration to the benchmark.
Moved array generation from tests to ExecutionPlan class, and changed strategy for generation: now generate random integer values instead of continuous array of integers (thanks to @Spektre for this)
Changed @Setup level to Level.Trial, as per doc usage of Level.Invocation has some reasonable warnings.
Added more points (now 30).

Here is obtained data for iterative binary search:

and plotted graph with trend line:

Some points has big error, but the trend now tends to O(log n). I think the benchmark can be tuned for more precision using more iterations, warmups and forks.

answered Feb 10 '21 at 13:04

Andrey

433
1
5
19

*now generate random integer values instead of continuous array of integers* - BinarySearch needs a sorted array to work correctly. With a random array, you're likely to just have it always run to completion and report "not found". (Although the performance is probably similar to a valid case). You don't want random, unless you just mean random increase since the previous element. `arr[i]=i;` is fine, Spektre was wrong about that, unless I misunderstood them. The only problem was generating it inside the timed part. – Peter Cordes Feb 10 '21 at 13:39
sorry, I meant sorted array of random ints, e.g. [1, 5, 18, 33, 42.. n]. Instead of ints incremented by 1: [1, 2, 3, 4, 5...n] – Andrey Feb 10 '21 at 13:42
Oh. Well either way is fine. binary search performance should be similar either way, a a[i]=i arrays was not a problem. – Peter Cordes Feb 10 '21 at 13:46

Interpretation of JMH results and algorithm complexity

2 Answers2