67

I looked into some C code from
http://www.mcs.anl.gov/~kazutomo/rdtsc.html
They use stuff like __inline__, __asm__ etc like the following:

code1:

static __inline__ tick gettick (void) {
    unsigned a, d;
    __asm__ __volatile__("rdtsc": "=a" (a), "=d" (d) );
    return (((tick)a) | (((tick)d) << 32));
}

code2:

volatile int  __attribute__((noinline)) foo2 (int a0, int a1) {
    __asm__ __volatile__ ("");
}

I was wondering what does the code1 and code2 do?

(Editor's note: for this specific RDTSC use case, intrinsics are preferred: How to get the CPU cycle count in x86_64 from C++? See also https://gcc.gnu.org/wiki/DontUseInlineAsm)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
user3692521
  • 2,563
  • 5
  • 27
  • 33
  • 1
    https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html (otherwise, ignore the `__` everywhere, `__inline__` is just plain `inline`. – Marc Glisse Oct 20 '14 at 11:44

3 Answers3

93

The __volatile__ modifier on an __asm__ block forces the compiler's optimizer to execute the code as-is. Without it, the optimizer may think it can be either removed outright, or lifted out of a loop and cached.

This is useful for the rdtsc instruction like so:

__asm__ __volatile__("rdtsc": "=a" (a), "=d" (d) )

This takes no dependencies, so the compiler might assume the value can be cached. Volatile is used to force it to read a fresh timestamp.

When used alone, like this:

__asm__ __volatile__ ("")

It will not actually execute anything. You can extend this, though, to get a compile-time memory barrier that won't allow reordering any memory access instructions:

__asm__ __volatile__ ("":::"memory")

The rdtsc instruction is a good example for volatile. rdtsc is usually used when you need to time how long some instructions take to execute. Imagine some code like this, where you want to time r1 and r2's execution:

__asm__ ("rdtsc": "=a" (a0), "=d" (d0) )
r1 = x1 + y1;
__asm__ ("rdtsc": "=a" (a1), "=d" (d1) )
r2 = x2 + y2;
__asm__ ("rdtsc": "=a" (a2), "=d" (d2) )

Here the compiler is actually allowed to cache the timestamp, and valid output might show that each line took exactly 0 clocks to execute. Obviously this isn't what you want, so you introduce __volatile__ to prevent caching:

__asm__ __volatile__("rdtsc": "=a" (a0), "=d" (d0))
r1 = x1 + y1;
__asm__ __volatile__("rdtsc": "=a" (a1), "=d" (d1))
r2 = x2 + y2;
__asm__ __volatile__("rdtsc": "=a" (a2), "=d" (d2))

Now you'll get a new timestamp each time, but it still has a problem that both the compiler and the CPU are allowed to reorder all of these statements. It could end up executing the asm blocks after r1 and r2 have already been calculated. To work around this, you'd add some barriers that force serialization:

__asm__ __volatile__("mfence;rdtsc": "=a" (a0), "=d" (d0) :: "memory")
r1 = x1 + y1;
__asm__ __volatile__("mfence;rdtsc": "=a" (a1), "=d" (d1) :: "memory")
r2 = x2 + y2;
__asm__ __volatile__("mfence;rdtsc": "=a" (a2), "=d" (d2) :: "memory")

Note the mfence instruction here, which enforces a CPU-side barrier, and the "memory" specifier in the volatile block which enforces a compile-time barrier. On modern CPUs, you can replace mfence:rdtsc with rdtscp for something more efficient.

Cory Nelson
  • 29,236
  • 5
  • 72
  • 110
  • so with empty block, it is kind of instruction barrier? – Bryan Chen Oct 20 '14 at 00:20
  • 2
    Note that the compiler can only control the static code order it generates, and avoid moving stuff past this barrier at compilation time, but it can't control the actual execution order within the CPU which may still change it (the CPU doesn't know about the volatile attribute, or the empty code block). With `rdtsc` this can potentially cause some inaccuracies. – Leeor Oct 20 '14 at 00:28
  • @Leeor Indeed, hence "compile-time barrier". – Cory Nelson Oct 20 '14 at 00:31
  • I had issues with fences and rdt on Haswells (i3, i5, i7 of various generations). Not only fence did no good, but added more to inaccuracy, if I remember correctly rdtscp on its own had +/- 4 ticks from average and (various) fences only increased it. Most of online resources about it seem to stop on Pentium 4 era, and apparently some stuff changed. – PTwr Oct 20 '14 at 08:22
  • 1
    Mostly the code in the question just sucks. It should use the `__rdtsc` intrinsic. `volatile` is useless in `asm volatile("")`. And your explication of volatile isn't good, with `asm("rdtsc":...` the compiler can even reorder the asm blocks (or remove them if a0 and d0 are unused), while with `volatile` it has to keep them in this order, but it can still move the additions and stores across. – Marc Glisse Oct 20 '14 at 11:41
  • 1
    Note: Although not particularly related, `rdtsc` should be avoided for performance monitoring since lots of factors can alter the result. – edmz Oct 20 '14 at 16:29
20

asm is for including native Assembly code into the C source code. E.g.

int a = 2;
asm("mov a, 3");
printf("%i", a); // will print 3

Compilers have different variants of it. __asm__ should be synonymous, maybe with some compiler-specific differences.

volatile means the variable can be modified from outside (aka not by the C program). For instance when programming a microcontroller where the memory address 0x0000x1234 is mapped to some device-specific interface (i.e. when coding for the GameBoy, buttons/screen/etc are accessed this way.)

volatile std::uint8_t* const button1 = 0x00001111;

This disabled compiler optimizations that rely on *button1 not changing unless being changed by the code.

It is also used in multi-threaded programming (not needed anymore today?) where a variable might be modified by another thread.

inline is a hint to the compiler to "inline" calls to a function.

inline int f(int a) {
    return a + 1
}

int a;
int b = f(a);

This should not be compiled into a function call to f but into int b = a + 1. As if f where a macro. Compilers mostly do this optimization automatically depending on function usage/content. __inline__ in this example might have a more specific meaning.

Similarily __attribute__((noinline)) (GCC-specific syntax) prevents a function from being inlined.

tmlen
  • 8,533
  • 5
  • 31
  • 84
  • 1
    Thx!! And what is the benefit of noinline? – user3692521 Oct 19 '14 at 23:43
  • 1
    I guess it just makes sure that calling `foo2` gets translated to a function call to an empty function with two integer arguments and returning an integer, in the assembly. Instead of being optimized away. That function could then be implemented in the generated assembly code. – tmlen Oct 19 '14 at 23:47
  • how does it know to return an integer(which integer?) if the function is empty? – user3692521 Oct 19 '14 at 23:49
  • It is defined as a function returning an `int`. That means the generated assembly code will be such that the caller expects an int return value to be on some register (depending on calling convention). In the assembly code the function body can then be coded to do that. – tmlen Oct 19 '14 at 23:51
  • 2
    I'd say volatile on an asm block is quite a bit different from volatile on a variable. Although the common themre remains, namely that it restricts the liberties of the optimizer. – MvG Oct 20 '14 at 08:31
  • 2
    "It is also used in multi-threaded programming (not needed anymore today?) where a variable might be modified by another thread." - while it is indeed used it's incorrect as it guarantees only the instruction ordering of accesses not atomicity of access to memory (though aligned access is atomic on most architectures) or memory fences (except the MSVC extension - which is disabled on ARM). For proper usage it's necessary to use C(++)11 atomics or compiler intrinsics. – Maciej Piechotka Oct 20 '14 at 08:38
3

The __asm__ attribute specifies the name to be used in assembler code for the function or variable.

The __volatile__ qualifier, generally used in Real-Time-Computing of embedded systems, addresses a problem with compiler tests of the status register for the ERROR or READY bit causing problems during optimization. __volatile__ was introduced as a way of telling the compiler that the object is subject to rapid change and to force every reference of the object to be a genuine reference.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • Not really, it's for anything with side-effects you don't / can't describe with operand constraints, e.g. when you want it to still happen even if all the output operands are unused. – Peter Cordes Oct 24 '17 at 09:40
  • Isn't that what forcing every reference of the object to be a genuine reference says? The reason I'm a bit confused by the "not really" is the description was taken nearly verbatim from reference documentation as existed in Oct. 2014. I'll see if I can dig up the cite. – David C. Rankin Oct 24 '17 at 23:07
  • I was mostly disagreeing with saying it's only relevant for RTC. It's not about "rapid" change, just anything that can have side-effects. That "every reference a genuine reference" sounds like a description of the `volatile` type qualifier (e.g. `volatile int`), not GNU C `asm volatile`. With inline asm there's no "the object". – Peter Cordes Oct 24 '17 at 23:13
  • Gotcha, I guess it would be better worded to say `volatile` disables optimization that discard asm statements if they determine there is no need for the output variables, anyway `:)` – David C. Rankin Oct 24 '17 at 23:15
  • Yes, [plus *some* prevention of re-ordering](https://stackoverflow.com/questions/26456510/what-does-asm-volatile-do-in-c/26456620?noredirect=1#comment41567603_26456845), and more if you use a `"memory"` clobber to make it a compiler barrier. – Peter Cordes Oct 24 '17 at 23:21