How many ALU operations can run in parallel on modern x86(_64) CPUs when each instruction has no dependency to others? It would obviously depend on a specific model, but then, where can I find such information? This wikipedia page of Sandy Bridge says it has 3 integer ALUs per core. Does this mean it can run 3 integer arithmetic instructions at the same time?
My question is about such code, for example,
add r10, r11
mov [...], r10
add r12, r13
mov [...], r12
add r14, r15
mov [...], r14
add rax, r11
mov [...], rax
add rax, r13
mov [...], rax
add rax, r15
mov [...], rax
I may be wrong, but my understanding is that the first 3 add
s can run in parallel while the following add
s have to wait until the previous operations have finished.