This is my first benchmark ever, might be incorrect!
The research about difference in other languages (especially C/C++) made me curious so I've decided to try writing a benchmark (and learn how to do them at the same time).
Result summary:
Benchmark (n) Mode Cnt Score Error Units
MainBenchmark.columnFirst 10000 avgt 5 1921,752 ± 341,941 ms/op
MainBenchmark.rowFirst 10000 avgt 5 381,053 ± 44,640 ms/op
It seems that row-major order (I think that's the correct name) is about 5 times faster than column-major.
In day to day programming this won't really matter, you will almost never need to optimize something like this. It was done just for science.
Here is the benchmark I've wrote, it creates int[10000][10000] array and tries to iterate over all the elements:
@State(Scope.Benchmark)
@Fork(value = 1, warmups = 2)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@BenchmarkMode(Mode.AverageTime)
public class MainBenchmark {
@Param({"10000"})
private int n;
private int[][] testData;
public static void main(String[] args) throws Exception {
Options opt = new OptionsBuilder()
.include(MainBenchmark.class.getSimpleName())
.forks(1)
.build();
new Runner(opt).run();
}
@Setup
public void setup() {
testData = createData();
}
@Benchmark
public void rowFirst(Blackhole bh) {
for (int i = 0; i < testData.length; i++) {
for (int j = 0; j < testData[0].length; j++) {
int tmp = testData[i][j];
bh.consume(tmp);
}
}
}
@Benchmark
public void columnFirst(Blackhole bh) {
for (int j = 0; j < testData[0].length; j++) {
for (int i = 0; i < testData.length; i++) {
int tmp = testData[i][j];
bh.consume(tmp);
}
}
}
private int[][] createData() {
int[][] ints = new int[n][n];
for (int[] anInt : ints) {
Arrays.fill(anInt, 0);
}
return ints;
}
}
Here are the full results:
# JMH version: 1.23
# VM version: JDK 13.0.1, OpenJDK 64-Bit Server VM, 13.0.1+9
# VM invoker: C:\Program Files\Java\jdk-13.0.1\bin\java.exe
# VM options: -Dvisualvm.id=957546995472200 -javaagent:(...) -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: benchmarking.MainBenchmark.columnFirst
# Parameters: (n = 10000)
# Run progress: 0,00% complete, ETA 00:10:00
# Warmup Fork: 1 of 2
# Warmup Iteration 1: 1810,536 ms/op
# Warmup Iteration 2: 1883,026 ms/op
# Warmup Iteration 3: 1798,335 ms/op
# Warmup Iteration 4: 1806,877 ms/op
# Warmup Iteration 5: 1797,246 ms/op
Iteration 1: 1794,506 ms/op
Iteration 2: 1822,085 ms/op
Iteration 3: 1845,853 ms/op
Iteration 4: 2000,127 ms/op
Iteration 5: 2045,922 ms/op
# Run progress: 16,67% complete, ETA 00:09:15
# Warmup Fork: 2 of 2
# Warmup Iteration 1: 1780,858 ms/op
# Warmup Iteration 2: 1771,650 ms/op
# Warmup Iteration 3: 1786,517 ms/op
# Warmup Iteration 4: 2198,348 ms/op
# Warmup Iteration 5: 1742,218 ms/op
Iteration 1: 2124,944 ms/op
Iteration 2: 2187,857 ms/op
Iteration 3: 1905,843 ms/op
Iteration 4: 1925,476 ms/op
Iteration 5: 1785,446 ms/op
# Run progress: 33,33% complete, ETA 00:07:22
# Fork: 1 of 1
# Warmup Iteration 1: 2082,695 ms/op
# Warmup Iteration 2: 1783,062 ms/op
# Warmup Iteration 3: 1799,518 ms/op
# Warmup Iteration 4: 1800,832 ms/op
# Warmup Iteration 5: 1974,720 ms/op
Iteration 1: 1934,673 ms/op
Iteration 2: 2013,677 ms/op
Iteration 3: 1784,654 ms/op
Iteration 4: 1895,396 ms/op
Iteration 5: 1980,359 ms/op
Result "benchmarking.MainBenchmark.columnFirst":
1921,752 ±(99.9%) 341,941 ms/op [Average]
(min, avg, max) = (1784,654, 1921,752, 2013,677), stdev = 88,801
CI (99.9%): [1579,811, 2263,693] (assumes normal distribution)
# JMH version: 1.23
# VM version: JDK 13.0.1, OpenJDK 64-Bit Server VM, 13.0.1+9
# VM invoker: C:\Program Files\Java\jdk-13.0.1\bin\java.exe
# VM options: -Dvisualvm.id=957546995472200 -javaagent:C:\Program Files\JetBrains\IntelliJ IDEA 2019.2.4\lib\idea_rt.jar=51059:C:\Program Files\JetBrains\IntelliJ IDEA 2019.2.4\bin -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: benchmarking.MainBenchmark.rowFirst
# Parameters: (n = 10000)
# Run progress: 50,00% complete, ETA 00:05:32
# Warmup Fork: 1 of 2
# Warmup Iteration 1: 381,809 ms/op
# Warmup Iteration 2: 394,792 ms/op
# Warmup Iteration 3: 384,524 ms/op
# Warmup Iteration 4: 389,858 ms/op
# Warmup Iteration 5: 378,686 ms/op
Iteration 1: 373,117 ms/op
Iteration 2: 371,832 ms/op
Iteration 3: 373,667 ms/op
Iteration 4: 384,930 ms/op
Iteration 5: 377,080 ms/op
# Run progress: 66,67% complete, ETA 00:03:37
# Warmup Fork: 2 of 2
# Warmup Iteration 1: 381,334 ms/op
# Warmup Iteration 2: 383,445 ms/op
# Warmup Iteration 3: 387,772 ms/op
# Warmup Iteration 4: 410,992 ms/op
# Warmup Iteration 5: 374,811 ms/op
Iteration 1: 383,491 ms/op
Iteration 2: 389,619 ms/op
Iteration 3: 388,545 ms/op
Iteration 4: 369,743 ms/op
Iteration 5: 372,389 ms/op
# Run progress: 83,33% complete, ETA 00:01:47
# Fork: 1 of 1
# Warmup Iteration 1: 368,873 ms/op
# Warmup Iteration 2: 383,034 ms/op
# Warmup Iteration 3: 394,592 ms/op
# Warmup Iteration 4: 373,512 ms/op
# Warmup Iteration 5: 373,946 ms/op
Iteration 1: 394,551 ms/op
Iteration 2: 392,298 ms/op
Iteration 3: 376,282 ms/op
Iteration 4: 372,903 ms/op
Iteration 5: 369,230 ms/op
Result "benchmarking.MainBenchmark.rowFirst":
381,053 ±(99.9%) 44,640 ms/op [Average]
(min, avg, max) = (369,230, 381,053, 394,551), stdev = 11,593
CI (99.9%): [336,412, 425,693] (assumes normal distribution)
# Run complete. Total time: 00:10:42
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
Benchmark (n) Mode Cnt Score Error Units
MainBenchmark.columnFirst 10000 avgt 5 1921,752 ± 341,941 ms/op
MainBenchmark.rowFirst 10000 avgt 5 381,053 ± 44,640 ms/op
Process finished with exit code 0