I have some inline assembly which I try to profile. Interestingly two very similar operations maxss and minss right after each other have a very different performance impact. Does anybody have experience with this? Perhaps it is some caching? Or the CPU Usage in Visual Studio is just not correct.
Asked
Active
Viewed 100 times
1
-
5This is from HW performance counters? With superscalar OoO exec, it's not easy to attribute a cycle to any specific instruction. Hardware often "blames" an instruction that's waiting for a slow input, not the one that's slow to produce the result. Also, for an event like `cycles`, there's granularity of issue and retire groups: when more than 1 instruction is getting executed per cycle, HW might always blame the first one in a group. Also "skew" is a real effect, blaming later instructions. – Peter Cordes Aug 27 '20 at 07:37
-
1Related example: [Why is 'add' taking so long in my application?](https://stackoverflow.com/q/60232283) shows a lot of blame on an instruction waiting for a load, rather than the load. – Peter Cordes Aug 27 '20 at 07:54
-
Sounds like a tool bug, any tool that thinks they can tell you how long an individual instruction takes (doesnt have to be superscaler, pipelined will do), is not one to be trusted. Profiling functions is one thing, instructions is another. – old_timer Aug 27 '20 at 14:50
-
What are the few instructions before the first `maxss xmm2,xmm1` instruction? It is likely waiting for the `shufps xmm1,xmm1,2` instruction have a result available. – 1201ProgramAlarm Aug 27 '20 at 16:40
-
Related: [Unexpectedly slow performance on moving from register to frequently accessed variable](https://stackoverflow.com/q/76773814) is a simpler case, just a store waiting for an L3-miss load+add, with the store getting all the "blame" since it's waiting for the result of a slow instruction. Also [Inconsistent \`perf annotate\` memory load/store time reporting](https://stackoverflow.com/q/65906312) is a more complex example but with full details on surrounding code. – Peter Cordes Jul 26 '23 at 18:36