4

I read that

One way to understand how the processor used its time is to look at the hardware counters. To help with performance tuning, modern processors track various counters as they execute code: the number of instructions executed, the number of various types of memory accesses, the number of branches encountered, and so forth. To read the counters, you’ll need a tool such as the profiler in Visual Studio 2010 Premium or Ultimate, AMD Code Analyst or Intel VTune.

So what is the way to do it by coding like using PerformanceCounter And get this number of instructions executed ?

And is there any way to count the hit of instructions like DotTrace,Ants, VS profiler do it ?

xanatos
  • 109,618
  • 12
  • 197
  • 280
Omega
  • 1,539
  • 1
  • 11
  • 18
  • Quoted from http://igoro.com/archive/fast-and-slow-if-statements-branch-prediction-in-modern-processors/ – xanatos Apr 22 '15 at 09:10
  • I have to ask: Why? In my experience the number of instructions isn't a very helpful metric. – Skizz Apr 23 '15 at 10:19

2 Answers2

2

For the VS Profiler:

To view a list of a list of all CPU counters that are supported on the current platform

In Performance Explorer, right-click the performance session and then click Properties.

Do one of the following:

  1. Click Sampling, and then select Performance counter from the Sample event list. The CPU counters are listed in Available performance counters. Note Click Cancel to return to the previous sampling configuration.

-or-

  1. Select CPU Counters, and then select Collect CPU Counters. The CPU counters are listed in Available counters. Note Click Cancel to return to the previous counter collection configuration.

You can't access the full CPU counters from your program, because: https://stackoverflow.com/a/8800266/613130

You can use RDPMC instruction or __readpmc MSVC compiler intrinsic, which is the same thing.

However, Windows prohibits user-mode applications to execute this instruction by setting CR4.PCE to 0. Presumably, this is done because the meaning of each counter is determined by MSR registers, which are only accessible in kernel mode. In other words, unless you're a kernel-mode module (e.g. a device driver), you are going to get "privileged instruction" trap if you attempt to execute this instruction.

(RDPMC is the instruction that returns the CPU counters)

I'll add that normally the number of instructions executed is quite useless. What is important is the CPU time that was used to execute some code. Each instruction has a different CPU time, so even knowing the number of them, you wouldn't know the number of CPU cycles/time used.

If you want to know the CPU cycles used for some instructions, then you can use the ASM instruction RDTSC/RDTSCP. Using it in C# is complex and quite time-consuming (so using it is slow enough that it often compromises the measuring you are trying to do). If you are interested, I wrote a response about it some days ago: https://stackoverflow.com/a/29646856/613130

Community
  • 1
  • 1
xanatos
  • 109,618
  • 12
  • 197
  • 280
  • Just Question about what u said here instruction has a different CPU time and why the same instruction have different CPU time if we check if with proflier app we will see that ? – Omega Apr 22 '15 at 09:38
  • @Omega No, *different* instructions have *different* cpu times, as in `ADD` and `MUL`. If you do `MUL EAX, EAX; MUL EAX, EAX` the instruction count is 2, but clearly it is much slower than `ADD EAX, EAX; ADD EAX, EAX`, because multiplication is slower than addition. There is a reason why you normally measure the time of execution of a piece of code, not the number of assembly instructions of a piece of code. – xanatos Apr 22 '15 at 09:44
  • Yes i'm talking about the same instruction , why the same instruction like Console.WriteLine((end - begin)); Give different cpu time each time ? – Omega Apr 22 '15 at 09:46
  • @Omega A simple example could be cache differences, or (depending on how you measure time, if it is "wall" time (the time on an external clock) or thread time), the app could have been interrupted by Windows to do other things.... Measuring times is always a statistical thing, not an exact science – xanatos Apr 22 '15 at 09:48
  • especially with instructions like `Console.Write` that have to "communicate" with an "external" system (the GDI) – xanatos Apr 22 '15 at 09:49
  • I measure cpu time wall time is normal that we will have difference time because of example which u said cache or threading , and thank you :) – Omega Apr 22 '15 at 10:00
1

Already answered to your question, so i think this is duplicating question.

It's native methods in WinAPI most likely, you can invoke them from C# via DLLImport. But for simplicity you could try use third party wrapper from here.

But you should clearly understand what you are doing. Between first call of your function and the second one will be difference because of JITting time. And if your method allocates memory - GC might occur any time while calling your method and it will be reflected in measurement.

Community
  • 1
  • 1
Alexey Korovin
  • 342
  • 1
  • 8
  • I belive the OP wants the number of retained uOps per clock and you cannot get with the functions provided. You need to write a device driver like provided with VTune. Ideally you would also like to do instrumentation and get an idea where in your cache misses causes the clocks per instruction to be unreasonably high – Jens Munk Aug 09 '23 at 20:40