I am analyzing the different between two designs which process millions of messages. One design uses polymorphism and the other doesnt- each message will be represented by a polymorphic sub type.
I have profiled both designs using VTune. The High-level summary data seems to make sense- the polymorphic design has a higher "branch mispredict" rate, higher CPI and higher "ICache misses" rate than the non-polymorphic version implemented with IF statements.
The polymorphic design has a line of source code like this:
object->virtualFunction();
and this is called millions of times (where the sub type changes each time). I am expecting the polymorphic design to be slower because of branch target mispredictions/instruction misses. As said above, the VTune "summary" tab seems to confirm this. However, when I go to the metrics next to the line of source code there are absolutely no metrics except for:
- Filled pipeline slots total -> Retiring -> General retirement
- Filled pipeline slots self -> Retiring -> General retirement
- Unfilled pipeline slots total -> Front end bound -> Front end bandwidth -> Front end bandwidth MITE
- Unfilled pipeline slots self -> Front end bound -> Front end bandwidth -> Front end bandwidth MITE
None of the branch prediction columns have data, nor do the instruction cache miss columns??
Could somebody please comment on whether this seems sensible? To me it doesn't- how can there be no branch misprediction or instruction cache miss statistics for a line of polymorphic code where the branch target will constantly be changing per message?
This cannot be due to compiler optimizations/inlining because the compiler wouldn't know the subtype of the object to optimize.
How should I profile the overhead of the polymorphism using VTune?