First of all, on my system (Java 9.0.4 x64) the array version as shown is twice as fast as the object version. So your benchmark is likely wrong.
But OK in order to compare apples to apples, we first refactor the array version in order to stride along the first dimension, just like in the object version:
for (int i = 0; i < size; ++i) {
sum += array[i][0];
sum += array[i][1];
sum += array[i][2];
sum += array[i][3];
}
In this case it indeed runs slower, due to frequent bounds checking of the tiny second dimension.
Remember that there are no true multidimensional arrays in Java; new int[size][4]
is really a shorthand for
int[][] array = new int[size][];
for (int i = 0; i < size; ++i) {
array[i] = new int[4];
}
You can visualize the first "column" dimension as containing pointers to the rows, one array object per row. Consequently, the size of each row is not really fixed and needs to be checked at run time.
Indeed we see that the array variant executes almost twice as many instructions:

That's because of all the bounds checking. Here's a fragment of the generated JIT code for test2:
0x4c8847b add eax, dword ptr [r12+r8*8+0x14]
0x4c88480 add eax, dword ptr [r12+r8*8+0x18]
0x4c88485 add eax, dword ptr [r12+r8*8+0x1c]
0x4c8848a shl r11, 0x3
0x4c8848e mov edx, 0x1
0x4c88493 nop
0x4c8849c nop
0x4c884a0 mov r8d, dword ptr [r11+rdx*4+0x10]
0x4c884a5 mov ecx, dword ptr [r12+r8*8+0xc] # bounds checking #
0x4c884aa lea r10, ptr [r12+r8*8]
0x4c884ae test ecx, ecx # bounds checking #
0x4c884b0 jbe 0x4c88572
0x4c884b6 add eax, dword ptr [r12+r8*8+0x10]
0x4c884bb cmp ecx, 0x1 # bounds checking #
0x4c884be jbe 0x4c88589 # bounds checking #
0x4c884c4 add eax, dword ptr [r12+r8*8+0x14]
0x4c884c9 cmp ecx, 0x3 # bounds checking #
0x4c884cc jbe 0x4c885a1
0x4c884d2 mov r9d, dword ptr [r11+rdx*4+0x14]
0x4c884d7 mov ecx, dword ptr [r12+r9*8+0xc] # bounds checking #
0x4c884dc add eax, dword ptr [r12+r8*8+0x18]
0x4c884e1 add eax, dword ptr [r12+r8*8+0x1c]
0x4c884e6 mov ebx, edx
0x4c884e8 inc ebx
0x4c884ea lea r10, ptr [r12+r9*8]
0x4c884ee test ecx, ecx # bounds checking #
0x4c884f0 jbe 0x4c88574 # bounds checking #
0x4c884f6 add eax, dword ptr [r12+r9*8+0x10]
0x4c884fb cmp ecx, 0x1 # bounds checking #
0x4c884fe jbe 0x4c8858b
0x4c88504 add eax, dword ptr [r12+r9*8+0x14]
0x4c88509 cmp ecx, 0x3 # bounds checking #
0x4c8850c jbe 0x4c885a7 # bounds checking #
0x4c88512 add eax, dword ptr [r12+r9*8+0x18]
0x4c88517 add eax, dword ptr [r12+r9*8+0x1c]
0x4c8851c add edx, 0x2
0x4c8851f cmp edx, 0x53ec5f
0x4c88525 jl 0x4c884a0
0x4c8852b cmp edx, 0x53ec60
0x4c88531 jnl 0x4c88566
The JVM is constantly being improved, so chances are this will eventually be optimized, at least for the case new int[size][4]
. For now though keep this in mind when using multidimensional arrays.