Can rdtsc be easily patched out?

Question

Many, many, many programs created in the era just before multi-core processors, use the instruction rdtsc to get precise data.

This is a serious problem in programs that are multi-threaded, as they might end with conflicting values, and many outright crash due to this (also some single-threaded programs also can crash depending on how they use rtdsc).

On Windows at least, is common the recommendation to just set "processor affinity", unfortunately this also heavily cripples some programs that were designed (improperly, obviously) to use parallelism.

So I was thinking, how hard is, even without any sort of source code, to hunt down rdtsc calls in crashy programs, and replace it with something else? (and I dunno what something else that is...)

What's the problem exactly? Multiple threads getting identical timestamps, but the program logic assuming that it can use timestamps as unique IDs? Or are you talking about problems when comparing timestamps after migrating from one core to another? The second problem could be fixed by pinning each thread to their own core, which wouldn't hurt parallelism (much, depending on the design). `rdtsc` is only a 2-byte instruction, so there's not much scope for binary-patching it to anything useful. It can be patched *out* easily enough, with a NOP. — Peter Cordes, Jan 27 '16 at 07:19
If the program was created in the era before multi-core processors than it probably wasn't designed to use multi-core processors effectively. It's a lot harder than just creating a bunch of threads to do random things, which is what most programmers thought back then. If any of these buggy programs had actually be tested at any point during their development on a multi-CPU (one per socket) system then the RDTSC bugs would've been revealed. It's unlikely such a program is going to suffer all that much from being limited to a single CPU, since that is in fact what they were designed for. — Ross Ridge, Jan 27 '16 at 08:11
So software that was written in an era where 2 processor cores was a luxury is limited by an OS feature that supports only 32 cores. Hmm. — Hans Passant, Jan 27 '16 at 09:45
It's fun to think about this, but this is hardly a "serious problem". I'm sure it happens, but not that much in programs that still matter today. — harold, Jan 27 '16 at 11:40
By the way, I am talking mostly about games (although some old server-related software has issues too, it is how I found out the bug existed, when trying to fix a game I was developing, I found some IBM talk about the issue in their servers). But many old games, used RTDSC to track time, for example to see how much time passed since the last physics calculation, so they can use physics formuals like Position = oldposition+velocity*time, the problem with RTDSC is that this can result in time being zero or negative, sometimes leading to divide by zero, physics moving backwards, and so on... — speeder, Jan 27 '16 at 16:35
It shouldn't actually happen though. In the days before Invariant TSC, sure. Today, not really. The TSC runs at the same rate on all cores, the only way to get it out of sync is deliberately by writing to it, which no one does. — harold, Jan 31 '16 at 20:25
harold, bugs may still happen. also as cores get desynced due to throttle, their response time may get desynced too no? and even if the TSC is only one for the entire processor, they might query it at a different time. — speeder, Feb 01 '16 at 01:14
@speeder well that's the thing, throttle doesn't desync them - unless they're old and don't have invariant TSC. Of course a program can query at different times, but not into the past - if it gets moved from one core to an other, it's no different than if it had been paused for a bit. — harold, Feb 01 '16 at 10:22

score 3 · Accepted Answer · answered Jan 27 '16 at 07:20

3

As a general rule, if somebody hands you a machine code binary, you can have an extremely (Turing!) hard time determining which bytes in are instructions, and which are code. If you can't get that right, you can't even find the RDTSC instructions to patch out. (Worse: some programs generate code; now stuff in data areas are runtime might ephemerally contain RDTSC). In really peculiar programs, some instructions might literally overlap others, leading to some JMPs literally landing in the middle of what is identified as a long instruction. (x86 instructions can be something like 16 bytes long!).

The binary reverse engineering guys have this problem. In general, I don't know how they succeed. I suspect it is because most program object code is generated by compilers that aren't trying to hide anything (watch out when you meet a compiler that does).

If you could find them, I assume you'd replace them by a function call to a routine that loaded a known constant into the registers to avoid your suggested inconsistency problem. Patching their locations might be pretty awkward; RDTSC is (I think) 2 bytes, and they might be sandwiched between two other instructions that can't be moved for some reason. So you might be forced to use just a breakpoint (1 byte) on each RDTSC to trap out to an RDTSC simulator; this creates possibly a performance problem if somebody is using RDTSC to read nanosecond clock ticks in a timing loop.

All in all, this seems like a hard road to take. How badly do you want to run really old programs, and why?

answered Jan 27 '16 at 07:20

Ira Baxter

93,541
22
172
341

If you can modify the OS, `rdtsc` could be replaced with an instruction that traps, like `int 0x81` (2 bytes, same as `rdtsc`). You can pick an interrupt vector that's not used for anything else, and just put the address of your `rdtsc` emulation routine into that entry of the interrupt descriptor table (or something like that). That's probably as low overhead as you can get; somewhat less than using the debug trap and handling it from a debugger-like user-mode program. (@speeder: A debug trap is just a one-byte encoding for `int 0x3` with some special-case handling.) – Peter Cordes Jan 27 '16 at 07:32
1

@PeterCordes If you can modify the OS you can disable the RDTSC instruction (TSD bit of CR4) and then emulate it when it faults. – Ross Ridge Jan 27 '16 at 08:14
@RossRidge: Would that disable it for use by the OS, and by non-broken program, too though? We only want this emulation for crappy programs that make wrong assumptions about what they get from `rdtsc`. Also, this `rdtsc` emulation would become a special-case of the illegal-instruction trap handler which can only be detected by examining the instruction stream. (Again, if my understanding is correct.) The `int 0x81` vector could do something as simple as loading `edx` and `eax` from a global, without saving/restoring all the regs in prep for running a lot of OS code. – Peter Cordes Jan 27 '16 at 08:46
I don't have a number in mind for the cost in cycles of just the `int 0x81` / `iret`. `iret` is also serializing, so it's certainly not cheap for the pipeline anyway. I think an illegal-instruction handler would have to load the instruction that faulted as data. This means a cache-line of program code ends up in the L1 D-cache, as well as L1 I-cache, which is a waste. – Peter Cordes Jan 27 '16 at 08:50
1

@PeterCordes Setting the TSD only disables the instruction outside of ring 0, and the modification can simply set it only for applications that need it. Or all of them, it doesn't really make that much of difference. Most VMs virtualize RDTSC so it's not really all that unusual. Using RDTSC with TSD set outside of ring 0 results in a general protection fault, and the cost of examining the code is pretty small compared to the cost of the fault or an INT instruction. And more importantly, doing it this way means that you don't need to patch the executable. – Ross Ridge Jan 27 '16 at 13:02
I've read that modern OSs were supposed to emulate RDTSC anyway, but most of them don't. Windows 8.1 (that I am using right now) seemly doens't, games that abuse RDTSC and threading crash a lot if you don't set them to be shoved in a single core. – speeder Jan 27 '16 at 16:38
I build an application that heavily uses assembler and RDTSC to read both machine features and clock time stamps. This code dates back to 1998, and seems to run fine under all the Windows variants, and even Linux and Wine. Can you be more specific about the threading crashes? How does RDTSC *cause* this? – Ira Baxter Jan 27 '16 at 16:59
@IraBaxter crashes can be caused by badly designed math or unchecked math. Example: I discovered the RDTSC bug when my computer crashed with a game I made, it was a physics heavy game, and used RDTSC to calculate how much time passed since last physics update, to use that as "delta T" in formulas, the cores lack of sync resulted into some formulas dividing by zero (among other aberrant behaviour). Servers crashed because of RDTSC when they used it to store precise transaction dates (for safe databases or safe file storage), and could conclude a transaction was made in the "future" – speeder Jan 28 '16 at 14:38
Other seem to think RDTSC across multiple cores are sync'd, at least for recent microprocessors: http://stackoverflow.com/a/10922036/120163 Is your data really different? – Ira Baxter Jan 29 '16 at 03:30
This isn't always reliable, MSDN docs used to have written (now they removed it), that when desync happens, it is the HAL or BIOS fault... but that felt more like shifting blaming to other people if somethign went wrong. Now MS states that QPC, even when using TSC, should get synced data, but that is not a guarantee, and that QPC function will try to find a synced source if it detects the current source is not synced, but it can't guarantee the resolution of the new source either (ie: the ultimate fallback for QPC is the system clock, that has resolution of 15ms) – speeder Jan 30 '16 at 22:22
So the clock/TSC synch problem is one that need help from the chip vendoers, and they seem to claim a solution [you saw my link to a purported Intel quote). (I have not looked carefully into this, and have never seen a specific hardware spec from the vendors on the topic; I've always been happy with the assertion that it was a "solved problem" modulo bad software from the OS provider). Do you have specific evidence that the TSCs are not synched? Do you know that it isn't bad HALs or BIOSes? How? – Ira Baxter Jan 30 '16 at 22:26
If the problem is bad HAL or BIOS, it helps me how? It does not matter if the culprit is the CPU, HAL or BIOS, the end result is the same: TSC desync and game crashing. – speeder Feb 01 '16 at 01:15
Depends. If your hardware is broken, and other hardware that isn't is easily available, why go to extremes for the broken hardware? – Ira Baxter Feb 01 '16 at 02:25

Can rdtsc be easily patched out?

1 Answers1