Assembly x86 (16bit): More accurate time measurement

Question

I'm programming in TASM 16bit with DOSBox and here's today's issue: Using DOS INT 21h/2Ch I can get the system's current hundredths of a second. That's good and all... until it's not.

See, I'm looking for an at least semi-accurate time measurement in milliseconds, and I'm positive it's possible.

Why, you ask? Have a look at INT 15h/86h. Using this interrupt I can delay the program in microseconds. If such precision exists, I'm sure getting milliseconds would be a walk in the park.

Some ideas I had: Using INT 70h which occurs every 1/1024 of a second, but I don't know how to listen to interrupts, nor do I want a timing system that can't be divided by 10.

This question has taken the better of me by now, and I've failed finding an already existing solution online.

Cheers in advance.

You can reprogram the PIT that provides the IRQ #0 (defaults to interrupt 8) to a higher frequency. IIRC it fires at circa 18.2 Hz (nearly 65_536 times per hour) by default and this is with a divisor of 65_536, the maximum. Your interrupt 8 handler should call the prior handler only some of the time to preserve the expected ROM-BIOS's timer tick frequency, the counter as stored in the doubleword 40h:6Ch. — ecm, May 10 '21 at 12:07
Probably the easiest way is to use `out 80h, al`. This should do 1us delay. I can't find an authoritative source about the precise delay. [Linux uses it](https://elixir.bootlin.com/linux/latest/source/arch/x86/boot/boot.h#L72) but I don't think it assumes a particular delay. You could couple `out 80h, al` with a less precise source of interrupts to achieve better precision. Furthermore, depending of the specific CPU model you target and the specific hardware you have configured, there are probably a lot of synchronization sources. — Margaret Bloom, May 10 '21 at 12:52
Are you trying to find the current time-of-day (absolute time), time an interval during which something else runs (measure CPU time used), or intentionally *delay* for some interval without doing any useful work during the interval? Your title talks about measurement, and I think you're only mentioning the delay call as evidence that some kind of precision timing should be possible. — Peter Cordes, May 10 '21 at 14:23
If you're going to run this on a modern x86 in real mode, `rdtsc` works as a time source, if you calibrate offset and scale factors. (64-bit counter in EDX:EAX, counts fixed-frequency "reference cycles" (on new-enough CPUs) since last reset.) [How to get the CPU cycle count in x86\_64 from C++?](https://stackoverflow.com/a/51907627) has more details about https://www.felixcloutier.com/x86/RDTSC.html — Peter Cordes, May 10 '21 at 14:25
@PeterCordes well, I'm programming in 16 bits so that's off the table. I do want the program to keep running, I just want some relative measurement (doesn't have to be the current time) of when a millisecond has passed so I can do my junk. — Coder's Crux, May 11 '21 at 04:10
@ecm so if I change my question a bit according to your answer, how do I listen to an interrupt? — Coder's Crux, May 11 '21 at 04:13
Like most instructions, `rdtsc` works normally in 16-bit real mode, still writing the result to edx:eax. (https://www.felixcloutier.com/x86/rdtsc#real-address-mode-exceptions is normal). Targeting 16-bit mode doesn't rule out modern x86 features. You might want to write code that will also run on retro hardware, but merely targeting 16-bit real mode doesn't automatically imply that. I wouldn't have suggested `rdtsc` if it couldn't work. — Peter Cordes, May 11 '21 at 04:31
I'm using pure assembly x86, and this instruction did not exist in TASM days.. — Coder's Crux, May 11 '21 at 04:35
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/232196/discussion-between-coders-crux-and-peter-cordes). — Coder's Crux, May 11 '21 at 04:40
1. Get current interrupt 8 handler. 2. Store it where your handler can access it (usually in a variable in your code segment). 3. Set handler to yours. (4. If you want to terminate your process then restore the handler to the prior one at that time.) The handler has to preserve all registers and return with `iret`, or jump with a far jump to the prior handler. Instead you can do `pushf` and a far *call* to the prior handler too. In your handler you can increment a variable of your own (in your code segment) to count how many of your (more frequent) ticks have occurred. — ecm, May 11 '21 at 08:43

njuffa · Answer 1 · 2021-05-12T19:50:13.460

In 16-bit PC-compatible x86 systems, the PIT (programmable interval timer) uses a clock input of 1.19318MHz to decrement a 16-bit counter. An interrupt is generated whenever the counter wraps around after 2¹⁶ = 65536 increments. The BIOS-provided ISR (interrupt service routine) handling it then increments a software counter, at a frequency of 1.19318MHz / 65536 ~= 18.2 Hz.

Under DOS and other real-mode operating systems, the 16-bit PIT counter can be read directly from the relevant port in two 8-bit chunks, and this data can be combined with the software-maintained tick counter to achieve millisecond resolution. Basically, one winds up using a 48-bit tick counter, where the 32-bit software counter maintained by the BIOS constitutes the most significant bits, and the 16-bit PIT counter constitutes the least significant bits.

Since the data is not all read out in one fell swoop, there is a risk of race conditions which have to be handled appropriately. Also, some BIOSes used to program the PIT as a square-wave generator rather than a simple rate counter. While this does not interfere with the task of incrementing the software tick, it does interfere with a straightforward combination of the PIT counter register with the software tick. This necessitates a one-time initialization of the PIT to make sure it is operating in rate-counting mode.

Below is 16-bit assembly code, wrapped up as a Turbo Pascal unit, that I used for many years for robust timing with millisecond accuracy. The conversion from tick counts to milliseconds here is a bit of a black box. I lost my design documentation for it and can't quickly reconstruct it on the fly now. As I recall this fixed-point computation had a jitter small enough that milliseconds could be measured reliably. The calling conventions of Turbo-Pascal required returning a 32-bit integer result in the DX:AX register pair.

UNIT Time;   { Copyright (c) 1989-1993 Norbert Juffa }

INTERFACE

FUNCTION Clock: LONGINT;             { same as VMS; time in milliseconds }


IMPLEMENTATION

FUNCTION Clock: LONGINT; ASSEMBLER;
ASM
             PUSH    DS              { save caller's data segment }
             MOV     DS, Seg0040     {  access ticker counter }
             MOV     BX, 6Ch         { offset of ticker counter in segm.}
             MOV     DX, 43h         { timer chip control port }
             MOV     AL, 4           { freeze timer 0 }
             PUSHF                   { save caller's int flag setting }
             CLI                     { make reading counter an atomic operation}
             MOV     DI, DS:[BX]     { read BIOS ticker counter }
             MOV     CX, DS:[BX+2]
             STI                     { enable update of ticker counter }
             OUT     DX, AL          { latch timer 0 }
             CLI                     { make reading counter an atomic operation}
             MOV     SI, DS:[BX]     { read BIOS ticker counter }
             MOV     BX, DS:[BX+2]
             IN      AL, 40h         { read latched timer 0 lo-byte }
             MOV     AH, AL          { save lo-byte }
             IN      AL, 40h         { read latched timer 0 hi-byte }
             POPF                    { restore caller's int flag }
             XCHG    AL, AH          { correct order of hi and lo }
             CMP     DI, SI          { ticker counter updated ? }
             JE      @no_update      { no }
             OR      AX, AX          { update before timer freeze ? }
             JNS     @no_update      { no }
             MOV     DI, SI          { use second }
             MOV     CX, BX          {  ticker counter }
@no_update:  NOT     AX              { counter counts down }
             MOV     BX, 36EDh       { load multiplier }
             MUL     BX              { W1 * M }
             MOV     SI, DX          { save W1 * M (hi) }
             MOV     AX, BX          { get M }
             MUL     DI              { W2 * M }
             XCHG    BX, AX          { AX = M, BX = W2 * M (lo) }
             MOV     DI, DX          { DI = W2 * M (hi) }
             ADD     BX, SI          { accumulate }
             ADC     DI, 0           {  result }
             XOR     SI, SI          { load zero }
             MUL     CX              { W3 * M }
             ADD     AX, DI          { accumulate }
             ADC     DX, SI          {  result in DX:AX:BX }
             MOV     DH, DL          { move result }
             MOV     DL, AH          {  from DL:AX:BX }
             MOV     AH, AL          {   to }
             MOV     AL, BH          {    DX:AX:BH }
             MOV     DI, DX          { save result }
             MOV     CX, AX          {  in DI:CX }
             MOV     AX, 25110       { calculate correction }
             MUL     DX              {  factor }
             SUB     CX, DX          { subtract correction }
             SBB     DI, SI          {  factor }
             XCHG    AX, CX          { result back }
             MOV     DX, DI          {  to DX:AX }
             POP     DS              { restore caller's data segment }
END;


BEGIN
   Port [$43] := $34;                { need rate generator, not square wave }
   Port [$40] := 0;                  { generator as programmed by some BIOSes }
   Port [$40] := 0;                  { for timer 0 }
END. { Time }

score 2 · Accepted Answer · answered May 12 '21 at 18:56

A big thank you to Peter Cordes in the comments for answering, I'll now post the answer to anyone else planning on using an old-fashioned compiler from 30 years ago.

Roughly, the best clock you can get in 16bit TASM is still not enough for accuracy. Luckily, in TASM you can "unlock" 32bit mode by using the .386 directive (as mentioned here).

Then, you can use the RDTSC command (Read Time-Stamp Counter), but one problem.. It does not exist in TASM. The fact it doesn't exist serves us no purpose, because all commands are in TASM (often called mnemonics) are just replacements for an OpCode, which is what defines every instruction the CPU can run.

When the Intel Pentium CPU was released, an OpCode for RDTSC was included, so if you have a CPU from it and up... You're good.

Now, how do we run the RDTSC instruction if it doesn't exist in TASM? (but does in our CPU)

In TASM, there's an instruction called db, and with it we can run an OpCode directly.

As seen here, what we'll need to do to run RDTSC is: db 0Fh, 31h.

And that's it! You can now run this instruction easily, and your program will still stay a mess, but a timed mess at that!

Note that using 32-bit operand-size in real mode is different from *32-bit mode*. The latter means "protected mode". But assembling `mov eax, ecx` in 16-bit mode just needs an operand-size prefix byte in the machine code. `.386` *after* a `.model` directive unlocks TASM's willingness to do that. Of course if you did just want to write fully 32-bit code, you could do that instead, but then it couldn't be a DOS `.com` program. — Peter Cordes, May 12 '21 at 19:00
@Peter Cordes: Technically, you *can* include 32-bit code segment parts in a DOS .COM executable, because you can include handling which [switches into DPMI and sets up a 32-bit code descriptor](https://hg.ulukai.org/ecm/dpmitest/file/5b579605db3c/dpmimini.asm#l88). Of course this requires a DPMI host. Same thing is true if you use VCPI, or when you start in Real 86 Mode and set up Protected Mode on your own. All possible from a .COM executable. — ecm, May 12 '21 at 21:15

score 1 · Answer 3 · answered Aug 23 '22 at 14:53

NOTE: This is NOT a qualified answer. but is a supplement notes to @njuffa 's answer. Hope it could helps others understanding the code. google leads me here and I can't help reading his code first before using it.

The formula to calc millisecond is (BIOS_counter*65536+PIT_counter) / 1193.18.

The asm code drop the last 8 bit during shifting which is like (counter*multiplier)>>8 or counter*multiplier/256.

Let 65536/1193.18 = multiplier/256 you get the multiplier=0x36ED in the assembly. The code uses multiply and divide(shift) to accomplish a non-integer calculation.

Another point worth to note: (BIOS_counter*65536 + PIT_counter) / 1193.18 equals

(BIOS_counter*65536 + PIT_counter*65536/65536) / 1193.18 equals

(BIOS_counter*65536 + HIWORD(PIT_counter*65536)) / 1193.18 qeuals

(BIOS_counter*multiplier + HIWORD(PIT_counter*multiplier)) >> 8

What I don't understand is the final correction using 25110.

Another thing worth to note: RDTSC works only on Pentium+, if you care about it. (ref: https://www.felixcloutier.com/x86/rdtsc)

`rdtsc` alone is only useful as a time-source on more recent CPUs than that, like maybe Core2 era or later is when it got the feature of the TSC not halting when the CPU stopped its clock (for a power-saving sleep state). On CPUs without variable frequency (like Pentium), RDTSC did tick at a fixed rate when it ticked at all, but on some early CPUs that could clock down at idle, RDTSC was still tied to the CPU clock. See [How to get the CPU cycle count in x86\_64 from C++?](https://stackoverflow.com/a/51907627) for notes on the `constant_tsc` and `nonstop_tsc` CPU features. — Peter Cordes, Aug 23 '22 at 15:32

Assembly x86 (16bit): More accurate time measurement

3 Answers3