I'm running very "simple" Test with.
@Fork(value = 1, jvmArgs = { "--illegal-access=permit", "-Xms10G", "-XX:+UnlockDiagnosticVMOptions", "-XX:+DebugNonSafepoints", "-XX:ActiveProcessorCount=7",
"-XX:+UseNUMA"
, "-XX:+UnlockDiagnosticVMOptions", "-XX:DisableIntrinsic=_currentTimeMillis,_nanoTime",
"-Xmx10G", "-XX:+UnlockExperimentalVMOptions", "-XX:ConcGCThreads=5", "-XX:ParallelGCThreads=10", "-XX:+UseZGC", "-XX:+UsePerfData", "-XX:MaxMetaspaceSize=10G", "-XX:MetaspaceSize=256M"}
)
@Benchmark
public String generateRandom() {
return UUID.randomUUID().toString();
}
May be it's not very simple, because uses random, but same issue is on any other tests with java
On my home desktop
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz 12 Threads (hyperthreading enabled ), 64 GB Ram, "Ubuntu" VERSION="20.04.2 LTS (Focal Fossa)"
Linux homepc 5.8.0-59-generic #66~20.04.1-Ubuntu SMP Thu Jun 17 11:14:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Performance with 7 threads:
Benchmark Mode Cnt Score Error Units
RulesBenchmark.generateRandom thrpt 5 1312295.357 ± 27853.707 ops/s
Flame Graph with AsyncProfiler Result with 7 Thread At Home
I have an issue on Oracle Linux
Linux 5.4.17-2102.201.3.el8uek.x86_64 #2 SMP Fri Apr 23 09:05:57 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux
Intel(R) Xeon(R) Gold 6258R CPU @ 2.70GHz with 56 Threads(hyperthreading disabled, the same when enabled and there is 112 cpu threads ) and 1 TB RAM I have half of performance (Even increasing threads) NAME="Oracle Linux Server" VERSION="8.4"
with 1 thread, I have very great performance:
Benchmark Mode Cnt Score Error Units
RulesBenchmark.generateRandom thrpt 5 2377471.113 ± 8049.532 ops/s
Flame Graph with AsyncProfiler Result 1 Thread
But with 7 thread
Benchmark Mode Cnt Score Error Units
RulesBenchmark.generateRandom thrpt 5 688612.296 ± 70895.058 ops/s
Flame Graph with AsyncProfiler Result 7 Thread
May be it's an issue of NUMA becase there is 2 Sockets, and system is configured with only 1 NUMA node numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
node 0 size: 1030835 MB
node 0 free: 1011029 MB
node distances:
node 0
0: 10
But after disabling some cpu threads using:
for i in {12..55}
do
# your-unix-command-here
echo '0'| sudo tee /sys/devices/system/cpu/cpu$i/online
done
Performance little improved, not much.
This is just very "simple" test. On complex test with real code, it's even worth,
It spends a lot of time on .annobin___pthread_cond_signal.start
I also deployed vagrant image with the same version of Oracle Linux
and kernel version on my home desktop and run it with 10 cpu threads, and performance was nearly as same (~1M op/sec) as on my descktop. So it's not about OS or kernel, but some configuration
Tested with several jDK versions and vendors (jdk 11 and above). It's very little performance when using OpenJDK 11 from YUM distribution, but not significant.
Can you sugest some advice Thanks in advance