3

How do the costs of atomic operations compare on common architectures (x86(version), arm(version), and PowerPC(version))?

Bonus points if you include a sourced estimate of cycles, and explain in terms of C11 Memory Orderings or include the instructions used on an architecture.

Extra bonus points if you can include uncommon/proposed architectures like RISC-V or Mill architecture.

mcarton
  • 27,633
  • 5
  • 85
  • 95
Avi
  • 133
  • 1
  • 7
  • I know I am asking a lot even if you can only answer for a subset of architectures please answer. – Avi Jul 27 '17 at 03:20
  • Looks like homework to me, sorry we cant do your homework for you. – Isuru H Jul 27 '17 at 07:43
  • 2
    x86 should really have "(version)" as well, there are huge differences, for example locked instructions (which are not the only atomic instructions) are about twice as slow on Bulldozer as they are on Ryzen. – harold Jul 27 '17 at 13:03
  • 3
    @IsuruH I really don't think this looks like homework. The question is relatively specific (at least I didn't encounter questions like this about specific architectures in school or university so far). The fact that he asks about uncommon architectures is another indicator that this question is not about solving this homework. If you downvoted the question because of this, please consider removing that downvote. – Manuel Jacob Aug 16 '17 at 17:24
  • 2
    Within each architecture, it depends on contention vs. no-contention. The no-contention case doesn't even have to go off-core for naturally-aligned shared variables. But for the contention case, it's potentially a lot slower to get a cache line from another socket on a dual-socket many-core Xeon than on a single-socket quad-core desktop. Or potentially even lower overhead with threads running on different logical cores of the same physical core (SMT / hyperthreading). But see https://stackoverflow.com/q/45602699/224132 for x86 atomic stores -> atomic loads, rather than RMW like `lock inc`. – Peter Cordes Aug 18 '17 at 23:15

0 Answers0