1

Suppose I'm interested in writing, or even just reading and understanding, some assembly code and its execution performance, from the perspective of a particular mainstream x86_64 processor architecture, e.g. Intel Nehalem, AMD K10, Intel Haswell, etc. Today's processors appear to be really complex, with flag stalls, out-of-order execution, dependency chain issues, different execution ports able to handle different subsets of opcodes in parallel, etc., and no two architectures quite run the code the same way.

What simulators/tools can I use to simulate executing some assembly code and see, for some target architecture, which lines execute at which clock ticks causing whatever latency on which execution ports, ideally with explanations for why certain things were delayed or reordered? Extra nice but not required would be being able to see branch prediction fail effects, L1/L2/L3 cache over time, and opcode dependency chains. If there's a way to trigger the cpu itself to run slow in some sort of profiling mode and report on this sort of thing in real time that would also work. I'm especially interested in Intel and AMD platforms, though if there's nothing for those I guess I'm interested in other architectures.

jdowdell
  • 1,578
  • 12
  • 24
  • Although this helps very minimally, the answer athttp://stackoverflow.com/a/11227902/1504882 explains some processor execution? – Elias Dec 16 '13 at 12:52
  • Agner Fog's [optimization manuals](http://www.agner.org/optimize/) are a good source of information and benchmarking code. – Brett Hale Dec 16 '13 at 15:42
  • Fairly sure this is available, just not to merely mortals like us. Micro-architecture implementation details are a heavy trade secret. – Hans Passant Dec 16 '13 at 16:18

1 Answers1

3

What you're looking for is a cycle accurate micro-architectural simulator, there are quite a few, but most of them offer only a generic implementation of modern uarch concepts (OOO, cache systems, memory units, branch predictors, prefetches, etc..). There are many other architectural simulators, but some of them don't even implement the uarch or are not cycle accurate (for e.g. - functional simulators, system emulators, etc..).

The reason you won't see such simulators is that even after publishing most of the uarch features and characteristics in various docs and optimization guides, both Intel and AMD keep the bulk of the micro-architectural implementation trade secret, for obvious reasons.

One small exception could be Marss, based on PTLSim, which I think relates to AMD and was shown here to be reasonably in sync with the actual CPU. However AMD hasn't acknowledged that it's accurate to the best of my knowledge. They also have a newer simulator released called "SimNow" that i'm less familiar with.

Adding Intels' code analyzer recommended by Bahbar - it may be useful, although it's not really a simulator that runs the code (let alone provide you with runtime tracing and statistics collection capabilities), it's a static analysis tool that tries to estimate the dependencies and runtime of a given code snippet.

See also this related question - Trace of CPU Instruction Reordering

Community
  • 1
  • 1
Leeor
  • 19,260
  • 5
  • 56
  • 87
  • IACA http://software.intel.com/en-us/articles/intel-architecture-code-analyzer/ can help on the execution side of things for Intel. – Bahbar Dec 18 '13 at 12:34
  • @Bahbar, it's hardly a simulator, but i've added it to the answer (see my reservations there), it still may be useful. Thanks. – Leeor Dec 18 '13 at 14:21
  • yeah, it's certainly not a simulator. Still, if you want to understand pipeline scheduling, it's a piece of the puzzle :) – Bahbar Dec 18 '13 at 14:26