I'd like to get a count of all of the vfmadd231ps ops (AVX512F) executed in an executable run in QEMU. Is this possible? Are there other ways of doing something like this that don't involve running with QEMU? (other tools? Would there be a way to do this in gdb, for example?)
I'm trying to compare two exectables which are performing essentially the same function to see why one might be running about 2x as fast as the other. They're both going to be doing a lot of multiply-add-accumulates, trying to see if one is performing fewer of these operations to get the same job done - if so it would indicate a more efficient algorithm, if not I'll need to look elsewhere.