I only have some rough idea about this, so I would like to have some more practicle ideas. Ideas for Linux, Unix, and Windows are all welcome.
The rough Idea in my head is:
The profiler setup some type of timer and a timer interrupt handler in the target process. When its handler takes control, it reads and saves the value of the instruction pointer register. When the sampling is done, it counts the occurences of every IP register value, then we can know the 'top hitters' among all sampled programe addresses.
But I do not actually know how to do it. Can someone give me some basic but practicle ideas of it? For example, what kind of timer (or equivalent) is always used? How to read the IP reg value? and etc. (I think when the execution enters the profiler's handler routine, the IP should be pointing the entrence of the handler, not to somewhere in the target program, so we cannot simplu read the current IP value)
Thank you for your answer!
Thanks for the answers from Peter Cordes and Mike Dunlavey.
Peter's answer tells how to read registers and memory of other process. Now I realized that the profiler does not have to execute 'inside' the target process, instead, it just reads the target's reg/mem using ptrace(2) from outside. It even does not have to suspend the target as the ptrace would do it anyway.
Mike's answer suggests that, for performance profiling, counting the occurrences of stack trace makes more sense than counting aginst the IP register values, as the latter may give too much noise information when the execution is in system module at the moment of sampling.
Thank you guys so much!