1

I am currently trying to improve the performance of a Java code. After digging a bit to see where optimization was required I ended up with the following setup (simplified for clarity).

The Board constructor which is called a large number of times (~200k to 2M):

public Board(Board board) {
    long now = System.currentTimeMillis();

    this.macroBoard = new int[9];
    int [] boardToCopy = board.getMacroBoard();
    for (int i = 0; i < 9; i++){
        this.macroBoard[i] = boardToCopy[i];
    }

    long duration = System.currentTimeMillis() - now;
    if (duration > THRESHOLD){
        System.err.println(duration);
    }
}

And in another class:

long end = System.currentTimeMillis() + SIMULATION_DURATION;
while (System.currentTimeMillis() < end) {
    ...
    ...
    Board board = new Board(otherBoard);
    ... 
    ...
}

The results puzzled me. In fact I observed two things:

  1. The larger SIMULATION_DURATION, the larger max(duration);
  2. The value of max(duration) can reach 2s (yes seconds, no typo) when SIMULATION_DURATION = 10s. If SIMULATION_DURATION = 100ms, I observe a max(duration) of around 30ms.

My questions are the following:

  1. How can the copy of a 9 integers array take so long?
  2. Why is duration less than 0.1ms 99% of the time and really high the remaining 1%?
  3. Why does it depend on the value of SIMULATION_DURATION?
  4. Am I making a mistake using System.currentTimeMillis() for this kind of benchmark and the results is therefore totally inaccurate?
  5. Is the GC involved in this strange behavior as I am creating a large number of Board objects?
Goujon
  • 37
  • 1
  • 7
  • 2
    Most likely related to free memory getting low. Check memory usage. – Henry Apr 05 '18 at 14:42
  • Have you considered the use of `System.arrayCopy`? This won't solve the problem, but may make your code clearer and less error prone. You can also avoid hardcoding the size of 9 by using `board.getMacroBoard().length`. – Thomas Timbul Apr 05 '18 at 15:03
  • @ThomasTimbul I actually tried, and observed the same. Thanks for the tip anyway! – Goujon Apr 05 '18 at 15:14

2 Answers2

4

It sounds like your VM is running out of memory and is trying to GC so that it can allocate memory for the new arrays. You can find information at this link to enable GC logging and get more details about our VM's GCing behaviour: https://dzone.com/articles/enabling-and-analysing-the-garbage-collection-log

Also, I recommend using System.nanoTime() to measure performance. For more details: System.currentTimeMillis vs System.nanoTime

To answer the questions directly:

How can the copy of a 9 integers array take so long?

It most certainly should not. Check the GC logs to confirm that GC is slowing down the VM.

Why is duration less than 0.1ms 99% of the time and really high the remaining 1%?

For 99% of the time, you have not run out of memory and thus there is no problem allocating space for the new Board objects.

Why does it depend on the value of SIMULATION_DURATION?

The value of SIMULATION_DURATION directly controls the number of Board objects.

Am I making a mistake using System.currentTimeMillis() for this kind of benchmark and the results is therefore totally inaccurate?

Check link to other stack overflow question above.

Is the GC involved in this strange behavior as I am creating a large number of Board objects?

Check answer above.

  • Thank you for the answer, I will investigate this. Would increasing the heap size by a sufficient amount solve this issue? – Goujon Apr 05 '18 at 14:58
  • 1
    Yes it most certainly would! Just be wary that if you plan on holding on to all those Board references, increase the heap size such that you do not run out of memory at part of the program. – pranavmalhotra Apr 05 '18 at 15:00
  • 1
    @Goujon it will delay, not solve the issue. But if you have enough memory, it will delay it long enough to be able to finish the run (i.e. solve). – Thomas Timbul Apr 05 '18 at 15:00
  • Well, solving the issue is a matter of perspective. If the application does indeed require that many Board objects at the same time, the only feasible way to achieve that would be to increase the heap size. From the way the problem has been presented, it does not seem like this is a strict requirement. – pranavmalhotra Apr 05 '18 at 15:15
  • You are both right, the way I asked my question is a bit ambiguous. To give some context, the application is working fine, I am just trying to make it more efficient (it's a game AI). In the end there is no real solution to the problem, as I am basically trying to maximize the number of Board objects for a given heap size. – Goujon Apr 05 '18 at 15:25
  • @Goujon Add `-verbose:gc` to your JVM arguments, and check if garbage collection or memory allocations correlate with performance deficiencies. – lexicore Apr 05 '18 at 15:44
1

The answer of pranavmalhotra is more valuable than mine, but there is an improvement which should certainly be done.

For

this.macroBoard = new int[9];
int[] boardToCopy = board.getMacroBoard();
for (int i = 0; i < 9; i++){
    this.macroBoard[i] = boardToCopy[i];
}

the first optimization would be

this.macroBoard = new int[9];
int[] boardToCopy = board.getMacroBoard();
System.arraycopy(boardToCopy, 0, macroBoard, 0, 9);

Or even:

int[] boardToCopy = board.getMacroBoard();
this.macroBoard = Arrays.copyOf(boardToCopy, 9);

Optimisation can take many forms. If the board ints have a range 0 .. 127, one could put every 7 bits int into a long, as 7 * 9 = 63 < 64 bits of a long. A long is a primitive type.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • Thanks for the recommendation. I did not mention it in my post, but I did tried with System.arraycopy() and observed the same behavior. The long optimization is a good idea, however in my case the ints are indeed bitboards :) – Goujon Apr 05 '18 at 15:11