How to make a cross-platform c++ inline assembly language?

Question

I hacked a following code:

unsigned long long get_cc_time () volatile {
  uint64 ret;
  __asm__ __volatile__("rdtsc" : "=A" (ret) : :);
  return ret;
}

It works on g++ but not on Visual Studio. How can I port it ? What are the right macros to detect VS / g++ ?

score 5 · Answer 1 · edited May 23 '17 at 12:14

The specific problem OP had aside: I found a way to define a macro that works for both syntax versions:

#ifdef _MSC_VER
#   define ASM(asm_literal) \
        __asm { \
            asm_literal \
        };
#elif __GNUC__ || __clang__
#   define ASM(asm_literal) \
        "__asm__(\"" \
            #asm_literal \
        "\" : : );"
#endif

Unfortunately, because the preprocessor strips newlines before macro expansion, you have to surround each assembly statement with this macro.

float abs(float x) {
    ASM( fld     dword ptr[x] );
    ASM( fabs                 );
    ASM( fstp    dword ptr[x] );

    return x;
}

But please be aware that GCC and clang use AT&T/UNIX assembly synax but MSVC usees Intel assembly syntax (couldn't find any official source though). But fortunately GCC/clang can be configured to use Intel syntax, too. Either use __asm__(".intel_syntax noprefix");/ __asm__(".att_syntax prefix"); (be sure to reset the changes as it will affect all assembly generated from that point on, even the one generated by the compiler from the C source). This would leave us with a macro like this:

#ifdef _MSC_VER
#   define ASM(asm_literal) \
        __asm { \
            asm_literal \
        };
#elif __GNUC__ || __clang__
#   define ASM(asm_literal) \
        "__asm__(\".intel_syntax noprefix\");" \
        "__asm__(\"" \
            #asm_literal \
        "\" : : );" \
        "__asm__(\".att_syntax prefix\");"
#endif

Or you can also compile with GCC/clang using the -masm=intel flag, which switches the syntax globally.

score 5 · Accepted Answer · answered Apr 21 '09 at 10:42

5

#if defined(_MSC_VER)
// visual c
#elif defined(__GCCE__)
// gcce
#else
// unknown
#endif

My inline assembler skills are rusty, but it works like:

__asm
{
// some assembler code
}

But to just use rdtsc you can just use intrinsics:

unsigned __int64 counter;
counter = __rdtsc();

http://msdn.microsoft.com/en-us/library/twchhe95.aspx

answered Apr 21 '09 at 10:42

Virne

1,205
11
11

Thanks! Is there linux variant for intrinsics rdtsc? – Łukasz Lew Apr 21 '09 at 11:12

sharptooth · Answer 3 · 2009-04-21T10:39:21.170

2

There's a _MSC_VER macro in VC++ that is described as "Microsoft specific" in MSDN and presumably is not defined when code is compiled on other compilers. You can use #ifdef to determine what compiler it is and compile different code for gcc and VC++.

#ifdef _MSC_VER
    //VC++ version
#else
    //gcc version
#endif

edited Apr 21 '09 at 10:39

answered Apr 21 '09 at 10:27

sharptooth

167,383
100
513
979

score 2 · Answer 4 · answered Apr 21 '09 at 13:14

2

Using the RDTSC instruction directly has some severe drawbacks:

The TSC isn't guaranteed to be synchronized on all CPUs, so if your thread/process migrates from one CPU core to another the TSC may appear to "warp" forward or backward in time unless you use thread/process affinity to prevent migration.
The TSC isn't guaranteed to advance at a constant rate, particularly on PCs that have power management or "C1 clock ramping" enabled. With multiple CPUs, this may increase the skew (for example, if you have one thread that is spinning and another that is sleeping, one TSC may advance faster than the other).
Accessing the TSC directly doesn't allow you to take advantage of HPET.

Using an OS timer interface is better, but still may have some of the same drawbacks depending on the implementation:

Linux: clock_gettime()
Windows: QueryPerformanceCounter()

Also note that Microsoft Visual C++ doesn't support inline assembly when targeting 64-bit processors, hence the __rdtsc() intrinsic that Virne pointed out.

answered Apr 21 '09 at 13:14

bk1e

23,871
6
54
65

1

Or even better use a platform independent library component like http://www.dre.vanderbilt.edu/Doxygen/Stable/ace/classACE__High__Res__Timer.html – lothar Apr 21 '09 at 15:23
TSC has its drawbacks, as described, but it has its advantages too. It is extremely fast (20-30 clockticks), whereas all other mechanisms such as HPET involve travelling into ring 0 and therefore cost 1000 clockticks or more. It is precise, whereas standard OS tools often offer the granularity of 10 ms. HPET is not available on many systems, and when it is, it may only be accessible to the superuser. Don't ask me why - just go find a nearest Linux box and check privileges on /dev/hpet. – Eugene Smith Sep 01 '10 at 09:23
As to synchronization, it's typically synchronized across cores on desktop Intels (not sure about mobile Intels), and, on AMDs, you can restrict migration across cores by modifying the processor affinity of your thread. – Eugene Smith Sep 01 '10 at 09:24

How to make a cross-platform c++ inline assembly language?

4 Answers4