Why is a linear search through a heap data structure faster than tree traversal?

Question

In the following code, indexOfSlow is around 17 times slower than indexOf in my test cases, although the former does fewer comparison than the latter.

Why is that?

(My hunch is that the superior performance of indexOf is due to sequential array access which has e.g. caching benefits - but I'd like to know for sure)

Performance numbers from my machine:

| elementCount | indexOf msec | indexOfSlow msec | slow/fast |
|--------------+--------------+------------------+-----------|
|        10000 |           41 |              852 |        21 |
|        20000 |          232 |             3387 |        15 |
|        30000 |          375 |             6642 |        18 |
|        40000 |          710 |            12699 |        18 |
|        50000 |         1197 |            19182 |        16 |
|--------------+--------------+------------------+-----------|
|      average |              |                  |      17.6 |

N.B. Java's PriorityQueue.indexOf() is similar to my indexOf, so it does not try to walk the tree and stop early like my indexOfSlow does.

class Heap<T extends Comparable<T>> {

    public static void main(String[] args) {
        int elementCount = 50_000;

        final List<Integer> elements = new ArrayList<>(elementCount);
        for (int i = 0; i < elementCount; i++)
            elements.add(i);

        for (int j = 0; j < 3; j++) {
            final var heap = new Heap<Integer>();
            for (int i : elements)
                heap.add(i);
            assert heap.peek() == 0;
            final long nanoBefore = System.nanoTime();
            for (int i = elementCount; i > 0; i--)
                heap.indexOf(i - 1);
//                heap.indexOfSlow(i - 1);
            final long nanoAfter = System.nanoTime();
            if (j > 0) // discard first round as warmup
                System.out.println("Time taken: " + (nanoAfter - nanoBefore) / 1_000_000 + " msec");
        }
    }

    private final ArrayList<T> list = new ArrayList<>();

    public T peek() {
        return list.isEmpty() ? null : list.get(0);
    }

    private void siftUp(int i, T value) {
        while (i > 0) {
            int parentI = (i - 1) / 2;
            final T parentValue = list.get(parentI);
            if (parentValue.compareTo(value) <= 0)
                return;
            list.set(parentI, value);
            list.set(i, parentValue);
            i = parentI;
        }
    }

    public void add(T value) {
        list.add(value);
        siftUp(list.size() - 1, value);
    }

    public int indexOf(T value) {
        final int size = list.size();
        if (size > 0) {
            for (int i = 0; i < list.size(); i++) {
                if (value.compareTo(list.get(i)) == 0)
                    return i;
            }
        }
        return -1;
    }

    public int indexOfSlow(T value) {
        final int size = list.size();
        if (size > 0) {
            Queue<Integer> childrenToVisit = new LinkedList<>();
            childrenToVisit.add(0);
            while (!childrenToVisit.isEmpty()) {
                int i = childrenToVisit.poll();
                final int cmp = list.get(i).compareTo(value);
                if (cmp == 0)
                    return i;
                if (cmp > 0)
                    continue;
                int rightChildIdx = (i + 1) * 2;
                int leftChildIdx = rightChildIdx - 1;
                if (leftChildIdx < size)
                    childrenToVisit.add(leftChildIdx);
                if (rightChildIdx < size)
                    childrenToVisit.add(rightChildIdx);
            }
        }
        return -1;
    }

}

So this is java? Or some other language? Please add a language tag. Also, if there's not a single obvious way in which benchmarking is done with this language/platform, please add details of how the benchmarking has been done also. — Damien_The_Unbeliever, May 11 '23 at 06:24
@Damien_The_Unbeliever Yes, it's Java. I added the tag. The benchmark was done running the main method which is at the top of the source code. — Evgeniy Berezovsky, May 11 '23 at 06:40
Your benchmark is flawed. Rewrite it using JMH https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java/4480774#4480774 — Erik, May 11 '23 at 07:08
You benchmark the worst case scenario for the `indexOfSlow()` method: the test data in the heap is totally ordered and therefore `indexOfSlow()` has to search through the same number of elements as `indexOf()`, but it does so in a much more complicated way. Initialize your test data with 50_000 random values. — Thomas Kläger, May 11 '23 at 08:12
@EvgeniyBerezovsky, max/min heap is a data structure which guarantees O(1) TC for `peek` and O(log(N)) TC for add/remove, `indexOf` method does not make any sense. — Andrey B. Panfilov, May 11 '23 at 09:13
In addition to what Thomas Kläger said, read [When to use LinkedList over ArrayList in Java?](https://stackoverflow.com/q/322715/2711488). You’ve chosen the worst way to store a sequence of `int` values. An `ArrayDeque` will perform much better even though it will still have the auto-boxing overhead. — Holger, May 11 '23 at 15:21

score 1 · Answer 1 · answered May 12 '23 at 04:22

You are writing ~ more than reading, moreover, you are doing that in a very inefficient manner, it is possible to get some improvements if replace LinkedList with int[]:

final int size = list.size();
if (size <= 0) {
    return -1;
}

int[] queue = new int[size];
int last = 0;
for (int index = 0; index <= last; index++) {
    int cmp = list.get(queue[index]).compareTo(value);
    if (cmp == 0) {
        return index;
    }
    if (cmp > 0) {
        continue;
    }
    int right = (index + 1) * 2;
    int left = right - 1;
    if (left < size) {
        queue[++last] = left;
        if (right < size) {
            queue[++last] = right;
        }
    }
}
return -1;

since in the worst case LinkedList will contain N/2 elements, such replacement is reasonable, however, that still performs worse than DFS approach:

int indexOf(T value, int index, int size) {
    if (index >= size) {
        return -1;
    }

    int cmp = list.get(index).compareTo(value);

    if (cmp == 0) {
        return index;
    }

    int res = -1;

    if (cmp > 0) {
        return res;
    }

    int rIdx = (index + 1) * 2;
    int lIdx = rIdx - 1;

    if ((res = indexOf(value, rIdx, size)) >= 0) {
        return res;
    }

    return indexOf(value, lIdx, size);
}

the idea of optimizing out some paths via storing information about them somewhere is not so good for this particular case - you may store indexes in HashMap and achieve O(1) TC for search operation utilising comparable amount of memory.

score 0 · Accepted Answer · answered May 11 '23 at 08:09

First of all, the results obtained with

final long nanoBefore = System.nanoTime();
//...
final long nanoAfter = System.nanoTime();

are not reliable at all. To get reliable results you need to use JMH.

Second, in indexOfSlow() you use LinkedList which is highly ineffective.

Third, in indexOf() you don't allocate any memory, iterating over the same list again and again. In indexOfSlow() you allocate a new LinkedList for each method invocation. This obviously costs us time and GC.

Fourth, in spite of the fact that in theory linear search has complexity O(n) in practice in your example you are searching over ArrayList which is array-based and extremely fast when you are accessing elements by index (even if you need to iterate over the whole list). In the indexOfSlow() on the contrary you rely on continuous poll()/add() operations. Imagine you are looking for 49733, then for the 1, 2, 3 etc. values you execute both poll() and add()operation. As a result before returning the value you are looking for your childrenToVisit list is modified 99734 (!!!) times. So this algorithm is extremely ineffectively implemented and this is why you have such a dramatic difference in numbers.

Thanks. Your LinkedList argument makes a lot of sense. (In hindisght, I should have come up with the myself...) I don't quite follow your and some other commentators condemnation of all microbenchmarks that don't use JMH, even though I do like and use JMH, and for a serious test it should be used. Nevertheless, nanoTime is not not reliable per se. In fact, JMH will probably be using nanoTime under the hood. — Evgeniy Berezovsky, May 12 '23 at 05:32
@EvgeniyBerezovsky The problem is not about JMH using something else than `nanoTime`. JMH helps to avoid common benchmarking pitfalls, and I see at least several of them in your hand-crafted benchmark, e.g., not mitigating dead code elimination by not consuming computation results, not doing warmup properly, and measuring OSR stubs instead of the fully optimized JIT compilation. — apangin, May 12 '23 at 11:03

Why is a linear search through a heap data structure faster than tree traversal?

2 Answers2