64-bit POSIX system.
POSIX has nothing to do with CPU features related to chunked data copying
Let's assume the processor cache contains my array
It would save from the main memory trip while executing copy instruction, but does not affect the order of the Big O notation.
the registers are 64+ bits long.
Even if you have AVX512 support on you architecture with 512 bits wide zmm
registers and JDK 9+ (AFAIK runtime is aware of AVX512 starting JDK 9+) it would allow you to copy 8 packed 64-bit integers per one instruction, but does not affect the order of complexity.
So to copy, say, 1024 64-bit integers you would require to execute at least 128 vector instructions again yielding O(n)
complexity, but with lower constant.
HotSpot implementation note:
The architecture-dependent code for arraycopy
implementation is generated on JVM global bootstrapping "phase" here StubRoutines::initialize2
.
Particular the chunked copy routine code generation is done in the platform dependent section of HotSpot code with copy_bytes_forward function (it is done with HotSpot's own Macro Assembler implementation).
The crucial parts of it is the CPU feature checks like
if (UseAVX > 2) {
__ evmovdqul(xmm0, Address(end_from, qword_count, Address::times_8, -56), Assembler::AVX_512bit);
__ evmovdqul(Address(end_to, qword_count, Address::times_8, -56), xmm0, Assembler::AVX_512bit);
} else if (UseAVX == 2) {
__ vmovdqu(xmm0, Address(end_from, qword_count, Address::times_8, -56));
__ vmovdqu(Address(end_to, qword_count, Address::times_8, -56), xmm0);
__ vmovdqu(xmm1, Address(end_from, qword_count, Address::times_8, -24));
__ vmovdqu(Address(end_to, qword_count, Address::times_8, -24), xmm1);
} else {
//...
}
which produces the code based on the available CPU features. The features detector is generated and called earlier in architecture dependent generator generate_get_cpu_info based on cpuid
instruction.