Convert Pentium II timing code into inline assembly?

Question

I am trying to use the following code in GCC. It is throwing errors(I guess because of __asm). Why is this simple and easy format is not working in GCC? Syntax of extended assembly is provided here. I am getting confused, when it comes to use of more variables in the inline assembly. Can some one convert the following program to appropriate form and give necessary explanation where ever there is use of variables.

    int time, subtime;
    float x = 5.0f;
    __asm {
            cpuid
            rdtsc
            mov     subtime, eax
            cpuid
            rdtsc
            sub     eax, subtime
            mov     subtime, eax    // Only the last value of subtime is kept
            // subtime should now represent the overhead cost of the
            // MOV and CPUID instructions
            fld     x
            fld     x
            cpuid                   // Serialize execution
            rdtsc                   // Read time stamp to EAX
            mov     time, eax
            fdiv                    // Perform division
            cpuid                   // Serialize again for time-stamp read
            rdtsc                           
            sub     eax, time       // Find the difference
            mov     time, eax
    }

.

Don't use the curly parantheses in your code. I think it is meant to use the round ones. And then use for your assembly instuctions quotation notes `__asm ("mov subtime, eax \n");` — Frodo, May 19 '16 at 07:03
@Frodo That's not going to work because you need to do much more than that. GCC's inline assembly requires that you specify exactly what the statement uses as input and output operands and what effect it has on registers and other state. — Ross Ridge, May 19 '16 at 07:11
An even better gcc asm tutorial can be found [here](http://locklessinc.com/articles/gcc_asm/). — Brett Hale, May 19 '16 at 09:09
A collection of guides / links to how to write GNU C inline asm that doesn't suck can be found [at the bottom of this answer](http://stackoverflow.com/questions/34520013/using-base-pointer-register-in-c-inline-asm/34522750#34522750). See also the [x86 tag wiki](http://stackoverflow.com/tags/x86/info). — Peter Cordes, May 19 '16 at 20:25

Michael Petch · Accepted Answer · 2016-05-21T22:27:44.573

Your question is effectively a code conversion question, which is generally off-topic for Stackoverflow. An answer however may be beneficial to other readers.

This code is a conversion of the original source material, and is not meant as an enhancement. The actual FDIV/FDIVP and the FLD can be reduced to a single FLD and a FDIV/FDIVP since you are dividing a float value by itself. As Peter Cordes points out though, you can just load the top of stack with a value 1.0 with FLD1. This would work since dividing any number by itself (besides 0.0) will take the same time as dividing 5.0 by itself. This would remove the need for passing the variable x into the assembler template.

The code you are using is a variation of what was documented by Intel 20 years ago for the Pentium IIs. A discussion of what is going on for that processor is described. The variation is that the code you are using doesn't do the warm up described in that document. I do not believe this mechanism will work overly well on modern processors and OSes (be warned).

The code in question is intended to measure time it takes for a single FDIV instruction to complete. Assuming you actually want to convert this specific code you will have to use GCC extended assembler templates. Extended assembler templates are not easy to use for a first time GCC developer. For assembler code you might even consider putting the code into a separate assembly file, assemble it separately, and call it from C.

Assembler templates use input constraints and output constraints to pass data into and out of the template (unlike MSVC).It also uses a clobber list to specify registers that may have been altered that don't appear as an input or output. By default GCC inline assembly uses ATT syntax instead of INTEL.

The equivalent code using extended assembler with ATT syntax could look like this:

#include <stdio.h>
int main()
{
    int time, subtime;
    float x = 5.0f;
    int temptime;
    __asm__ (
            "rdtsc\n\t"
            "mov %%eax, %[subtime]\n\t"
            "cpuid\n\t"
            "rdtsc\n\t"
            "sub %[subtime], %%eax\n\t"
            "mov %%eax, %[subtime]\n\t" 
            /* Only the last value of subtime is kept 
             * subtime should now represent the overhead cost of the
             * MOV and CPUID instructions */
            "flds %[x]\n\t"
            "flds %[x]\n\t"            /* Alternatively use fst to make copy */
            "cpuid\n\t"                /* Serialize execution */
            "rdtsc\n\t"                /* Read time stamp to EAX */
            "mov %%eax, %[temptime]\n\t"
            "fdivp\n\t"                /* Perform division */
            "cpuid\n\t"                /* Serialize again for time-stamp read */
            "rdtsc\n\t"
            "sub %[temptime], %%eax\n\t"
            "fstp %%st(0)\n\t"         /* Need to clear FPU stack before returning */
            : [time]"=a"(time),        /* 'time' is returned via the EAX register */
              [subtime]"=r"(subtime),  /* return reg for subtime */
              [temptime]"=r"(temptime) /* Temporary reg for computation
                                          This allows compiler to choose
                                          a register for temporary use. Register 
                                          only for BOTH so subtime and temptime 
                                          calc are based on a mov reg, reg */

            : [x]"m"(x)                /* X is a MEMORY reference (required by FLD) */
            : "ebx", "ecx", "edx");    /* Registers clobbered by CPUID
                                          but not listed as input/output
                                          operands */

    time = time - subtime; /* Subtract the overhead */
    printf ("%d\n", time); /* Print total time of divide to screen */
    return 0;
}

Stian Skjelstad · Answer 2 · 2016-05-20T11:56:42.837

gcc, icc and visual c, they all have very different syntax for inline assembler (This is not part of the C standard). The GCC is a bit more complex, but also more efficient, since you tell the compiler which registers are used for what, and which registers that are clobbered (used).

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

https://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

http://asm.sourceforge.net/articles/rmiyagi-inline-asm.txt

My gcc assembler is a bit rusty (a couple of years since I played with it), so there might be some mistakes there

int main(int argc, char *argv[])
{
  int time=0, subtime = 100;
  const float x = 5.0f;
  asm (
    "xorl    %%eax, %%eax        \n" /* make sure eax is a known value befeore cpuid */
    "cpuid                       \n"
    "rdtsc                       \n"
    "movl    %%eax, %[aSubtime]  \n"
    "cpuid                       \n"
    "rdtsc                       \n"
    "subl    %[aSubtime], %%eax  \n"
   // subtime should now represent the overhead cost of the
   // MOV and CPUID instructions
    "fld     %[ax]               \n"
    "fld     %[ax]               \n"
    "cpuid                       \n"   // Serialize execution
    "rdtsc                       \n"   // Read time stamp to EAX
    "movl    %%eax, %[atime]     \n"
    "fdivp                       \n"   // Perform division
    "cpuid                       \n"   // Serialize again for time-stamp read
    "rdtsc                       \n"
    "subl    %[atime], %%eax     \n"
//  "movl    %%eax, %2    \n"   Not needed, since we tell the compiler that asm exists with time in eax
      : "=a" (time) /* time is outputed in eax */
      : [aSubtime] "m" (subtime),
        [ax]       "m" (x),
        [atime]    "m" (time)
      : "ebx", "ecx", "edx"
    );
 /* FPU is currently left in a pushed state here */

  return 0;
}

Thanks for the links. But can you provide the proper assembly(for gcc) for above code.?? — ANTHONY, May 19 '16 at 15:38
Regarding %0 instead of names: I think the naming came after I started playing with it, or the guides back then did not mention them. — Stian Skjelstad, May 19 '16 at 19:25
`xorl %%eax, %%eax` isn't really necessary. _CPUID_ isn't being used to retrieve a particular value, it is being used for the side effect of serializing the instructions. It makes sure that RDTSC is executed when the previous instructions have actually completed. — Michael Petch, May 19 '16 at 20:03
Declare a dummy FP output operand, so gcc will know it needs to pop the stack. Also, instead of two loads and an `fdivp`, just do one load and `fdiv %st0, %st0`. Instead of loading a `5.0f`, you could also just `fld1` to load a 1.0. — Peter Cordes, May 19 '16 at 20:22

Convert Pentium II timing code into inline assembly?

2 Answers2