Anyone knows how to do it or can give me good book references, links etc...
For 80x86 hardware (regardless of operating system); there are only 3 ways to do this (that I know of):
a) send an inter-processor interrupt to the other CPU, and have an interrupt handler that stores the "return EIP" (from its stack) at a known address in memory so that your CPU can read "value of EIP immediately before interrupt" (with synchronization so that your CPU doesn't read before the value is written, etc).
b) put the other CPU into some kind of "debug mode" (single-stepping, last branch recording, ...) so that (either code in a debug exception handler or the CPU's hardware itself) is constantly writing EIP values to memory that you can read.
Of course both of these options will ruin performance, and the value you get will probably be useless (because EIP would've changed after you obtain it but before you can use the obtained value). To ensure the value is still useful; you'd need the other CPU to wait until after you've consumed the obtained value (and are ready for the next value); and to do that you'd have to resort to single-step debugging facilities (with the waiting in the debug exception handler), where you'll be lucky if you can get performance better than a thousand times slower (and can probably improve performance by simply disabling other CPUs completely).
Also note that they still won't accurately tell you EIP in all cases (e.g. if the CPU is in SMM/System Management Mode and is beyond the control of the OS); and I doubt Windows kernel supports any of it (e.g. kernel should support single-stepping of user-space processes/threads to allow debuggers to work, but won't support single-stepping of kernel and will probably lock up the computer due to various "waiting for lock to be released for 6 days" problems).
The last of the 3 options is:
c) Run the OS inside an emulator/simulator instead of running it on real hardware. In that case you can probably modify the emulator/simulator's code to inject EIP values somewhere (maybe some kind of virtual "EIP reporting device"?). This will ruin performance of the emulator/simulator, but you may be able to hide that (e.g. "virtual time inside the emulator passes at a rate of one second per 1000 seconds of real time outside the emulator").