0

I'd like to get a count of all of the vfmadd231ps ops (AVX512F) executed in an executable run in QEMU. Is this possible? Are there other ways of doing something like this that don't involve running with QEMU? (other tools? Would there be a way to do this in gdb, for example?)

I'm trying to compare two exectables which are performing essentially the same function to see why one might be running about 2x as fast as the other. They're both going to be doing a lot of multiply-add-accumulates, trying to see if one is performing fewer of these operations to get the same job done - if so it would indicate a more efficient algorithm, if not I'll need to look elsewhere.

aneccodeal
  • 8,531
  • 7
  • 45
  • 74
  • 1
    Intel's SDE has an instruction-mix option, including a histogram of all mnemonics, I think. See [Scan binary for CPU feature usage](https://stackoverflow.com/a/72469653) for an example of using it. If you're worried about back-end port pressure, though, you might want to run on real hardware with `perf stat --all-user -e cycles,instructions,uops_issued.any,branches,fp_arith_inst_retired.256b_packed_single`. But those FP counters count each FMA as two FP operations, so they're more like FLOP counters. `uops_dispatched_port.port_0` / 5 might help. – Peter Cordes Jun 06 '22 at 18:17
  • Which mode are you running QEMU in? KVM or TCG? Alternatively, can you use `perf` to retrieve execution statistics? – Arnabjyoti Kalita Jun 06 '22 at 18:17
  • @ArnabjyotiKalita KVM. I don't see any way to get perf to tell me any stats on specific opcode types that have been executed - do you know of a way to do that? (or alternatively to get an instruction trace - then I could just count up all of the ops that I'm interested in) – aneccodeal Jun 06 '22 at 18:24
  • If you want to compare just two function (with no complicated branching) I would suggest to start with a static code analysis (e.g., using "Intel Architecture Code Analyzer". Also, uops.info recently added that feature, but unfortunately is down at the moment). Just counting the number of `vfmadd231ps` does not tell you much about latencies, which could be crucial. – chtz Jun 06 '22 at 21:05

0 Answers0