3

I rewrote the entire question, people clearly weren't understanding it.

RDTSC used to count CPU cycles, and it varied with the CPU throttling.

Currently, RDTSC don't vary with CPU throttling.

Some old applications, expect RDTSC to vary with CPU throttling.

How I make RDTSC give them what they expect?

I don't want to profile code, I don't want to rewrite massive amounts of code, I don't want to oblige users to mess with the BIOS or Kernel permissions, I just want to make legacy apps work as they should.

speeder
  • 6,197
  • 5
  • 34
  • 51
  • See: http://stackoverflow.com/questions/23251795/how-to-calculate-the-frequency-of-cpu-cores and http://stackoverflow.com/questions/8351944/finding-out-the-cpu-clock-frequency-per-core-per-processor# – Mysticial Feb 01 '16 at 18:40
  • I saw these both before, it helped to understand the issue, but don't fix it. I am not writing a new application, I am trying to make legacy applications work without hacking them too much... I cannot just go around replacing large parts of their code. – speeder Feb 01 '16 at 18:42
  • People must hate me, every time I ask a question, I get immediately downvoted and noone explains why. – speeder Feb 01 '16 at 18:43
  • If I am understanding you correctly, you have legacy apps that use RDTSC already and you are really looking for a way to run existing binaries with minimal to no modifications so they work as they did on old hardware. I can only hope that you don't hire the guys who wrote the original code on new projects. – Michael Petch Feb 01 '16 at 19:13
  • I didn't downvote, but there is no code given, and the question is ambiguous. – Michael Petch Feb 01 '16 at 19:14
  • There is no code to give, RDTSC used to do X, now it do Y, I want it do X again, how hard is to understand that? Also, I asked if there is a way, if there isn't, that is also a answer. – speeder Feb 01 '16 at 19:16
  • Given your question as stated, it looks like the answer is no. You just need to fix your legacy applications. – Ross Ridge Feb 01 '16 at 19:16
  • There is a fix, rewrite the bad legacy code. But you said you don't want that fix, so I'd say "No" is your answer, unless you are just willing to go out and buy old hardware. – Michael Petch Feb 01 '16 at 19:21
  • So your real problem is finding legacy hardware which is broken in the bad old ways. Used computer stores abound; suggest you seek your solution there. – Ira Baxter Feb 01 '16 at 19:33
  • I can't find hardware for everyone! I want to people be able to use old software on new PCs... :) Also, I was hoping to write a wrapper, there are many, many, many software that breaks, I want to code some application (like Wine dlls on windows for example) to fix this particular issue. – speeder Feb 01 '16 at 20:05
  • Can you give an example or two of programs that are broken by non-varying `RDTSC`? How bad can it be? I mean, before `RDTSC` evolved from a perf-measuring tool into a very-low-overhead time-source, it wasn't useful for anything that you can't do better with perf-counters. (BTW, @Ira: old hardware wasn't technically broken, Intel just hadn't realized how much more useful a constant TSC would be. Or did they introduce `rdtsc` before SpeedStep? There was a while where some CPUs had constant-rate TSCs but stopped them while halted, which is what I'd call broken: unusable as a time source). – Peter Cordes Feb 03 '16 at 16:03
  • 1
    @PeterCordes: I've built parallel programming tools for SMP x86 since 1995. We've *always* used TSC as a source of timing. And yes, in the bad old days sometimes a thread switch from one CPU to another gave us inconsistent TSC counts. It wasn't unusuable; you just had to run timings several times and throw out nonsense answers. Yes, its much better now even on 16 core systems. – Ira Baxter Feb 04 '16 at 06:59
  • @PeterCordes the downvotes back then made me give up on reading the question. I came back today just out of curiosity... Awnsering your question, the software that most annoys me with this are basically games, many gamedevs used techniques valid on consoles, while making PC games, for example counting CPU cycles to sync several things, or to slow-down or speed-up physics and so on. The games that ALSO allow multi-threading, just outright crash, SimCity 4 for example, it relies on RDTSC heavily, and is crazy crashy except on the narrow generation of multi-core CPUs with "old" RDTSC behaviour. – speeder May 30 '20 at 18:48

3 Answers3

3

Simply put, you can't do it with a flick of a switch

Intel Developer Manual 3B, Chapter 17, explicitly reads

The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward.

Which is another way to tell you that there is no way to switch back to the previous behavior.


However if you really feel like it, you can try something.

rdtsc takes its value from the IA32_TIME_STAMP_COUNTER, which is writable.
So you can "fake" the read of rdtsc without changing any program, but you need a driver.
Changing IA32_TIME_STAMP_COUNTER to adjust for internal clock count may not be so easy.

I don't remember if there is a performance event that count internal clocks since reset, if there is, then in theory you have just to read that value and write in IA32_TIME_STAMP_COUNTER.
Newer CPU also support IA32_TSC_ADJUST which can be used to adjust the TSC in a relative way: Whatever you add/subtract from IA32_TSC_ADJUST is added/subtracted from IA32_TIME_STAMP_COUNTER. So you can slow down or speed up the counter.

Either way you need:

  • To create a driver to deliver to your users. Which may not have privileges to install it.
  • To know the exact throttling of the CPU, contrary to the vote count of gudok answer, performance counter registers are the only way to go. Unless you want to hook for OS power manager functions/events and go with educated guesses.
  • To map that throttling into a TSC value.
  • Choose how often to update the TSC (non trivial).
Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • If you're writing a driver, perf counters are probably the easiest way to go. You can get the number of core clocks since you last adjusted the TSC, which is exactly the number you need. Even if you're in kernel mode where you can access the OS's idea of the current clock speed with low overhead, that doesn't average over past history of turbo up/down events. And Skylake works best when the OS hands off CPU frequency decision-making to the core anyway, so it can adjust frequency more often than the OS checks it. – Peter Cordes Feb 03 '16 at 15:43
  • Keep in mind that your Windows driver or Linux kernel module should tell the OS that it can no longer use the TSC as a time source for time-of-day stuff. There's only one TSC per core, not per process or per thread, so messing with it will break `gettimeofday()` on Linux, for example. The maps a page of code with the `gettimeofday` implementation into every process's address space. On systems without the `constant_tsc` feature, it's never uses `syscall` or anything, and stays in user-space with `rdtsc`. So post-boot-time changing from TSC to non-TSC may be problematic on Linux. IDK. – Peter Cordes Feb 03 '16 at 15:49
  • @PeterCordes : She did say contrary to the vote count on his answer. I think she basically saying that Gudok had it right but it isn't reflected in the vote. – Michael Petch Feb 03 '16 at 16:12
  • @MichaelPetch: Ah I see. I think I figured that out at one point but then forgot by the time I finished writing my other comments. >. – Peter Cordes Feb 03 '16 at 16:52
0

I stumbled across this recently for unrelated reasons:

AMD Bulldozer-family (15h) CPUs have a new MSR: Timestamp Counter Ratio (TscRateMsr), as mentioned in AMD's optimization manual. They suggest VMMs "Use the Timestamp Counter Ratio feature to adjust the TSC frequency for guest VMs" (Section 12.16), but you could also use it to change the ratio depending on the current frequency-scaling setting.

For more information on the Timestamp Counter Ratio MSR, please refer to section 3.12, "MSRs - MSRC000_0xxx" in the BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors.

IDK if Intel has anything similar; I haven't looked.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Whoa, great info! I will research more about it, since the information I found so far is quite vague, but it maybe is the right track. Maybe you will end with the accepted answer :) – speeder Apr 24 '16 at 20:28
  • 1
    Yes, Intel should [have something similar](https://www-ssl.intel.com/content/www/us/en/processors/timestamp-counter-scaling-virtualization-white-paper.html). However, If I'm not wrong after a very quick read, it only affects VMM guests. So I'm not really sure it will solve the OP problems. – Margaret Bloom Apr 25 '16 at 12:49
  • I didn't check, AMD's MSR might also only affect guest VMs. This idea already required hooking into the OS's frequency-scaling governor, and now it may require running your code in a guest VM... Thanks for pointing that out, @Margaret. Almost certainly easier to just disable turbo and force the CPU to normal max speed. – Peter Cordes Apr 25 '16 at 12:55
  • Yep, it is for guest VM only :( As for disabling turbo: that doesn't fix it, because modern RDTSC standard doesn't oblige it to return values remotely related to the clock, a CPU can have one that is 100mhz if it wants to... or 10000000ghz... and so on, so any software that counted clock cycles with RDTSC will get the timer cycles, not the CPU, even with turbo disabled. – speeder May 04 '16 at 21:59
  • @speeder: Does that happen in practice, though? As I understand it, Intel CPUs run their TSC at their "rated" clock speed. IDK about AMD. Anyway, as long as the TSC has a constant ratio to real core clock cycles, you can convert easily. – Peter Cordes May 05 '16 at 02:01
  • But how do you get the ratio? – speeder May 05 '16 at 17:52
  • @speeder: Measure something of known performance with perf counters *and* RDTSC. e.g. a trivial loop like `.l: dec eax / jnz .l`. If needed, other experiments can tell you whether that loop runs at one iteration per 1c or per 2c or whatever. Also note that `CPUID` tells you the TSC clock speed in one of its outputs. – Peter Cordes May 05 '16 at 17:58
  • that CPUID said TSC I didn't knew, good to know! Still, there is any trivial loop that is GUARANTEED to run at the same clock as the CPU? I saw some people trying to fix the problem in my question above by resorting to old pre-TSC invention code, for example counting ADDs over time, and it is clear that cache, out of ordering, hyperthreading, and some other recent tech make that very innacurate, specially with the designers focus on improving "ICP" (instructions per clock) – speeder May 05 '16 at 18:48
  • `.l: dec eax / jnz .l` runs at one iteration per clock on all recent Intel CPUs. Keeping the loop tiny is essential, so you're just limited by taken-branch throughput. (see http://agner.org/optimize/, esp. the microarch pdf). You only need to predict the loop speed if you want to avoid perf counters in your setup program. You can assume the loop will take an integer number of actual core cycles per iteration, though, so if you fix the CPU clock speed then you can look at the ratio of iterations to TSC counts. If it's really close to an integer ratio (usually 1:1), then round it. – Peter Cordes May 05 '16 at 19:02
  • If you don't want to make any assumptions about the TSC, you can use perf counters *and* RDTSC to time any loop. Then `TSC_ratio = core_cycles TSC_counts`. **The loop cycles per iteration cancels out.** Again, though, it's safe to assume that the loop runs at a whole number of cycles per iteration. Taken-branch throughput is going to be one per 1 or 2 cycles. – Peter Cordes May 05 '16 at 19:05
-1

Use CPU performance counters. They are available in Linux by using perf_event_open syscall. Or, alternatively, you may measure globally how many cpu cycles you program takes by running perf utility.

gudok
  • 4,029
  • 2
  • 20
  • 30
  • after I measure that, how I give the info back to the program? EDIT: also, on windows. – speeder Feb 01 '16 at 18:45
  • `perf_event_open` is designed to measure fine-grained peaces of code. Unfortunately, I can't suggest any similar tool for Windows (most likely you will need to use compiler intrinsics to setup CPU counters). – gudok Feb 01 '16 at 18:53