If you need top speed, definitely use a single array (one-dimensional) and map your indices as appropriate. As I see in the thread linked to in a comment below your question, people seem to disregard the ill effects of 2d-arrays on CPU cache lines and emphasize only the number of memory lookups.
There is one consideration to take: if your inner arrays are large enough (1K or more, say), then the speed advantage starts fading away. If the inner arrays is smallish (like 10-50), then the difference should be noticeable.
EDIT
As rightfully demanded, here's my jmh
benchmark:
@OutputTimeUnit(TimeUnit.SECONDS)
public class ArrayAccess
{
static final int gapRowsize = 128, rowsize = 32, colsize = 10_000;
static final int[][] twod = new int[colsize][],
gap1 = new int[colsize][];
static final int[] oned = new int[colsize*rowsize];
static final Random r = new Random();
static {
for (int i = 0; i < colsize; i++) {
twod[i] = new int[rowsize];
gap1[i] = new int[gapRowsize];
}
for (int i = 0; i < rowsize*colsize; i++) oned[i] = r.nextInt();
for (int i = 0; i < colsize; i++)
for (int j = 0; j < rowsize; j++)
twod[i][j] = r.nextInt();
}
@GenerateMicroBenchmark
public int oned() {
int sum = 0;
for (int i = 0; i < rowsize*colsize; i++)
sum += oned[i];
return sum;
}
@GenerateMicroBenchmark
public int onedIndexed() {
int sum = 0;
for (int i = 0; i < colsize; i++)
for (int j = 0; j < rowsize; j++)
sum += oned[ind(i,j)];
return sum;
}
static int ind(int row, int col) { return rowsize*row+col; }
@GenerateMicroBenchmark
public int twod() {
int sum = 0;
for (int i = 0; i < colsize; i++)
for (int j = 0; j < rowsize; j++)
sum += twod[i][j];
return sum;
}
}
Note the gap array allocation: this simulates the worst-case scenario with fragmented heap.
I see more than 5-fold advantage at rowsize
= 32 and a still quite noticeable (25%) advantage at 1024. I also find the advantage to highly depend on the gap size, with the shown 128 being the worst case for rowsize
= 32 (both higher and lower values diminish the advantage), and 512 the worst case for rowsize
= 1024.
rowsize = 32, gapRowsize = 128
Benchmark Mean Units
oned 8857.400 ops/sec
twod 1697.694 ops/sec
rowsize = 1024, gapRowsize = 512
Benchmark Mean Units
oned 147.192 ops/sec
twod 118.275 ops/sec