19

In Chandler Carruth's CppCon 2015 talk he introduces two magical functions for defeating the optimizer without any extra performance penalties.

For reference, here are the functions (using GNU-style inline assembly):

void escape(void* p)
{
    asm volatile("" : : "g"(p) : "memory");
}

void clobber()
{
    asm volatile("" : : : "memory");
}

It works on any compiler which supports GNU-style inline assembly (GCC, Clang, Intel's compiler, possibly others). However, he mentions it doesn't work in MSVC.

Examining Google Benchmark's implementation, it seems they use a reinterpret cast to a volatile const char& and passes it to a function hidden in a different translation unit on non-gcc/clang compilers.

template <class Tp>
inline BENCHMARK_ALWAYS_INLINE void DoNotOptimize(Tp const& value) {
    internal::UseCharPointer(&reinterpret_cast<char const volatile&>(value));
}

// some other translation unit
void UseCharPointer(char const volatile*) {}

However, there are two concerns I have with this:

  1. I'm potentially incurring a function call
  2. There is the possibility a "clever" link-time optimizer might recognize UseCharPointer is small, inline it, then discard all the code I wanted kept around, or a "clever" optimizer might be allowed to perform other re-orderings I don't want it to.

Is there any lower-level equivalent in MSVC to the GNU-style assembly functions? Or is this the best it gets on MSVC?

helloworld922
  • 10,801
  • 5
  • 48
  • 85
  • Related: [Preventing compiler optimizations while benchmarking](https://stackoverflow.com/q/40122141) asks what the original GNU C inline asm versions really do. – Peter Cordes Aug 05 '20 at 18:19

2 Answers2

8

While I don't know of an equivalent assembly trick for MSVC, Facebook uses the following in their Folly benchmark library:

/**
 * Call doNotOptimizeAway(var) against variables that you use for
 * benchmarking but otherwise are useless. The compiler tends to do a
 * good job at eliminating unused variables, and this function fools
 * it into thinking var is in fact needed.
 */
#ifdef _MSC_VER

#pragma optimize("", off)

template <class T>
void doNotOptimizeAway(T&& datum) {
  datum = datum;
}

#pragma optimize("", on)

#elif defined(__clang__)

template <class T>
__attribute__((__optnone__)) void doNotOptimizeAway(T&& /* datum */) {}

#else

template <class T>
void doNotOptimizeAway(T&& datum) {
  asm volatile("" : "+r" (datum));
}

#endif

Here is a link to code on GitHub.

Tim Severeijns
  • 688
  • 10
  • 16
2

I was looking for a way to achieve the exact same thing in my own little benchmark lib. The frustrating thing about MSVC is that targeting x64 disallows the __asm trick while x86 allows it!

After some tries I reused google's solution without incurring additional call! The nice thing is that the solution works with both MSVC(/Ox) and GCC(-O3).

template <class T>
inline auto doNotOptimizeAway(T const& datum) {
    return reinterpret_cast<char const volatile&>(datum);
}

At the call site I simply do no use the volatile returned!

int main()
{
    int a{10};
    doNotOptimizeAway(a);
    return 0;
}

Generated ASM (Compiler Explorer)

a$ = 8
main    PROC
        mov     DWORD PTR a$[rsp], 10
        movzx   eax, BYTE PTR a$[rsp]
        xor     eax, eax
        ret     0
main    ENDP
Xenonamous
  • 29
  • 7
  • 1
    MSVC-style `__asm{}` wouldn't be particularly useful anyway; it doesn't have anything like GNU C inline asm's input / output constraints that let you tell the compiler that an empty block of asm actually reads or read-writes a C var to force the compiler to materialize that variable in a register. – Peter Cordes Mar 24 '19 at 01:21
  • 1
    A volatile load forces the compiler to materialize it *in memory*, not just `mov edx, 10` or something. But at least the compiler still knows that the var is only being read, so `return a;` can still compile to `mov eax, 10` in between the store/reload. But still, this looks like it will typically introduce 2 instructions into places where you use this, unless the var already exists in memory then you just get a load. So maybe useful without disturbing the compiler-generated code too much, but definitely not free. – Peter Cordes Mar 24 '19 at 01:27
  • 1
    For narrow types (`sizeof(T)<=sizeof(void*)`), assigning to `volatile T dummy = datum` might be cheaper in most cases: the store but not the reload. Only for data that was already spilled by the compiler is a `volatile` load better. – Peter Cordes Mar 24 '19 at 01:29