Unexpected Java performance

Question

I have just thrown everything I know about Java optimisation out the window. I have the following task:

Given a 2D array representing a playing field and a position on the field, fill another array with the number of steps a player can make to get to every other position in the field. The player can move up, down, left and right. For example, the first neighbours will be all 1's, with the diagonals being all 2's.

For the first attempt, I tried a simple 4-way floodfill algorithm. It wad dreadfully slow.

Secondly, I decided to get rid of the recursion and use a simple Queue. It worked lovely and gave a huge speed-up (very roughly 20x). Here is the code:

private void fillCounterArray(int[] counters, int position) {
    Queue<Integer> queue = new ArrayDeque<Integer>(900);

    // Obtain the possible destinations from position, check the valid ones
    // and add it the stack.
    int[] destination = board.getPossibleDestinations(position);
    for (int i = 0; i < destination.length; i++) {
        if (board.getBoard()[destination[i]] == Board.CLEAR) {
            counters[destination[i]] = 1;
            queue.add(destination[i]);
        }
    }

    // Now fill up the space.
    while (!queue.isEmpty()) {
        int pos = queue.remove();
        int steps = counters[pos];            
        destination = board.getPossibleDestinations(pos);
        for (int i = 0; i < destination.length; i++) {
            int dest = destination[i];
            if (board.getBoard()[dest] == Board.CLEAR && (counters[dest] > steps + 1 || counters[dest] == 0)) {
                counters[dest] = steps + 1;
                queue.add(dest);
            }
        }
    }
}

Now, "common-sense" told me that the queue operations with a static array and an int-pointer would be faster. So I removed the Queue and use a standard int[] array. The code is identical, except for the queue-like operations. It now looks like this (As you can see, I use to live by the C-side :)):

private void fillCounterArray(int[] counters, int position) {

    // Array and its pointer.
    int[] queue = new int[900]; // max size of field
    int head = 0;

    // Obtain the possible destinations from position, check the valid ones
    // and add it the stack.
    int[] destination = board.getPossibleDestinations(position);
    for (int i = 0; i < destination.length; i++) {
        if (board.getBoard()[destination[i]] == Board.CLEAR) {
            counters[destination[i]] = 1;
            queue[head++] = dest[i];
        }
    }

    // Now fill up the space.
    while (head > 0) {
        int pos = queue[--head];
        int steps = counters[pos];

        destination = board.getPossibleDestinations(pos);
        for (int i = 0; i < destination.length; i++) {
            int dest = destination[i];
            if (board.getBoard()[dest] == Board.CLEAR && (counters[dest] > steps + 1 || counters[dest] == 0)) {
                counters[dest] = steps + 1;
                queue[head++] = dest;
            }
        }
    }
}

When I ran this "optimised code" it was significantly slower than using the Queue and only about twice as fast as the recursive technique. There is also hardly any difference when I declare the array as an instance variable. How is this possible?

Can you add `getPossibleDestinations` (and `getBoard()`) method implementation — , Sep 05 '12 at 06:08
There's no need... it's a simple array lookup and constant in both implementations. All it does is give the four neighbours for a given position. — Jaco Van Niekerk, Sep 05 '12 at 06:09
Take a look at http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java to eliminate benchmarking caveats. — Vadzim, Sep 05 '12 at 06:10
Vadzim, I am not trying to benchmark the above. When I run each version, the difference in performance is blatantly obvious. The one is about 10 times faster than the other and it is completely counter-intuitive. If I apply correct benchmarking techniques, all I'll end up with is more accurate estimate of the performance difference. — Jaco Van Niekerk, Sep 05 '12 at 06:14
Be aware that `Queue.offer()` can return false if the element is not added and that could explain the difference (see http://docs.oracle.com/javase/6/docs/api/java/util/ArrayDeque.html#offer(E)) — , Sep 05 '12 at 06:16
@RC - I've changed offer to add and get basically the same performance. It makes sense though, as the test cases for the algorithm pass in both instances. — Jaco Van Niekerk, Sep 05 '12 at 06:19
You sure you're getting through a warm-up phase? Most JVMs start out interpreting code. Could be that using the queue somehow triggers a compile to native sooner. — Tony K., Sep 05 '12 at 06:22
@Tony... thought occurred to me to. I continue to run the program for several minutes (it gets called periodically), yet... it still gives poor performance. It does not make sense, a simple array should be faster, should it not?? — Jaco Van Niekerk, Sep 05 '12 at 06:26
@Jaco Van Niekerk, the point about benchmarking caveats is that you may be measuring totally different things. Like interpreted code with JIT-compiled, that would easily give 20x difference. — Vadzim, Sep 05 '12 at 06:29
@Vadzim, does this imply that the ArrayDeque I am using may be natively/JIT copiled, while the array implementation is not? How can one ascertain that? — Jaco Van Niekerk, Sep 05 '12 at 06:32
@Jaco Van Niekerk, by running the same test in a loop multiple (thousands) times. See the link above. — Vadzim, Sep 05 '12 at 06:35
@Vlad, I am doing just that... that method is called 1000's of times as part of a alpha-beta search. Only the queue implementation changes... benchmarking is not the issue, really. If I ONLY change the implementation as above, I get a huge and very noticeable performance difference. I just cannot understand how a direct int-array implementation can be so much slower than a ArrayDeque. — Jaco Van Niekerk, Sep 05 '12 at 06:40

score 3 · Accepted Answer · answered Sep 05 '12 at 09:30

3

Your reversed the order while optimising I think;

The queue is fifo, first in first out
The array is lifo, last in first out, as you walk it downwards

That will usually give you different performance ;-)

answered Sep 05 '12 at 09:30

IvoTops

3,463
17
18

Absolutely spot-on. How could I have missed that... After changing to a head and tail pointer, the code outperforms the Dequeue as expected. Thank you! – Jaco Van Niekerk Sep 05 '12 at 11:26

score 1 · Answer 2 · edited Sep 05 '12 at 06:19

1

Insert two Counters in each Loop one in the for loop and one in the while loop in both versions, compare the numbers you get at the end, how many rounds do you make in each version, if you have another Loop in getPossibleDestination then log the pos variable too. I guess that would be a good starting point to figure it out.

Another way would be to print the time difference on different lines in your program, let say before the 1st loop, between both and at the end, once you compare results and know where it takes a long time in the second Version you can print timestamps in the loop on different lines.

edited Sep 05 '12 at 06:19

Joachim Sauer

302,674
57
556
614

answered Sep 05 '12 at 06:08

CloudyMarble

36,908
70
97
130

3

I'm not sure what that would accomplish. The counters will be the same. It is only the queue operations that have changed from a standard Queue to an int-array. The "logic" is the same. – Jaco Van Niekerk Sep 05 '12 at 06:11
I didnt read the whole code, i would log it anyway, it doesnt really coat that much , but its up to you, try the timelog if your sure about the number of rounds – CloudyMarble Sep 05 '12 at 06:20
Seeing the code i notices queue.offer(destination[i]); this returns a boolean value according to success or failure, consider this is failing to insert this will effect the rounds in your second loop (Theoretically) – CloudyMarble Sep 05 '12 at 06:25
I changed offer() to add(), test cases still pass, same results, same performance. – Jaco Van Niekerk Sep 05 '12 at 06:33
with the typing effort for all the comments you could have easly benchmarked it and found the problems. – CloudyMarble Sep 05 '12 at 06:35
I disagree with you... I already know the code is equivalent (no counters needed) and I already know EXACTLY where, namely where the ArrayDeque is substituted with an int-array (no timing needed). Thanks anwyay... – Jaco Van Niekerk Sep 05 '12 at 06:42
3

The code is not entirely equivalent. As far as I can see you use Deque in a FIFO fashion, while your int[] is used as a stack (LIFO). So this might make a difference in how many rounds are needed. – user1252434 Sep 05 '12 at 09:21
@OD, apologies, your counter method would have pointed out the above. Sorry, should have listed to you. – Jaco Van Niekerk Sep 05 '12 at 11:28

Unexpected Java performance

2 Answers2