As per my current understanding from the ARM Cortex A57 and A78 TRM, micro ops can be issued out of order to 1 among the several execution pipelines.
This is instruction reordering for independent instruction as far as I understood.
Memory access reordering is something which means observers and slaves in a system may observe memory accesses in different sequence compare to the program sequence. This could mean 1 of the following -
1 - CPU reordered the memory access micro ops and issued to the load and store pipelines. Interconnect(ACE/CHI) did not do any reordering
2 - CPU issues the micro-ops in program order but Interconnect(ACE/CHI) reordered it
Is my understanding correct? If yes, then will the barrier instruction halt the CPU pipeline by stopping further instruction issue or Interconnect throttles the CPU master interface till Barrier instruction response is received?
I asked in ARM blog but no response as of now.
EDIT 1
As per suggestion by Peter, I wanted to mention following precondition for my question -
1 - Multi cluster ARM SoC along with other ACE masters like DMA enginer, iGPU, etc.
2 - The question is for inner-shareable as well as outer shareable memory (eg - Memory accessed by threads running in different CPU cluster)
3 - Question is for Cacheable (This is clarified by Peter to a great extent) and Non-Cacheable normal memory as I wanted to understand how memory access observation by other observers is related to ordering in CPU pipeline in out of order pipeline architecture such as ARM Cortex A78