22

Consider this C code:

extern volatile int hardware_reg;

void f(const void *src, size_t len)
{
    void *dst = <something>;

    hardware_reg = 1;    
    memcpy(dst, src, len);    
    hardware_reg = 0;
}

The memcpy() call must occur between the two assignments. In general, since the compiler probably doesn't know what will the called function do, it can't reorder the call to the function to be before or after the assignments. However, in this case the compiler knows what the function will do (and could even insert an inline built-in substitute), and it can deduce that memcpy() could never access hardware_reg. Here it appears to me that the compiler would see no trouble in moving the memcpy() call, if it wanted to do so.

So, the question: is a function call alone enough to issue a memory barrier that would prevent reordering, or is, otherwise, an explicit memory barrier needed in this case before and after the call to memcpy()?

Please correct me if I am misunderstanding things.

Andrey Vihrov
  • 448
  • 5
  • 10

5 Answers5

10

The compiler cannot reorder the memcpy() operation before the hardware_reg = 1 or after the hardware_reg = 0 - that's what volatile will ensure - at least as far as the instruction stream the compiler emits. A function call is not necessarily a 'memory barrier', but it is a sequence point.

The C99 standard says this about volatile (5.1.2.3/5 "Program execution"):

At sequence points, volatile objects are stable in the sense that previous accesses are complete and subsequent accesses have not yet occurred.

So at the sequence point represented by the memcpy(), the volatile access of writing 1 has to occurred, and the volatile access of writing 0 cannot have occurred.

However, there are 2 things I'd like to point out:

  1. Depending on what <something> is, if nothing else is done with the the destination buffer, the compiler might be able to completely remove the memcpy() operation. This is the reason Microsoft came up with the SecureZeroMemory() function. SecureZeroMemory() operates on volatile qualified pointers to prevent optimizing writes away.

  2. volatile doesn't necessarily imply a memory barrier (which is a hardware thing, not just a code ordering thing), so if you're running on a multi-proc machine or certain types of hardware you may need to explicitly invoke a memory barrier (perhaps wmb() on Linux).

    Starting with MSVC 8 (VS 2005), Microsoft documents that the volatile keyword implies the appropriate memory barrier, so a separate specific memory barrier call may not be necessary:

    Also, when optimizing, the compiler must maintain ordering among references to volatile objects as well as references to other global objects. In particular,

    • A write to a volatile object (volatile write) has Release semantics; a reference to a global or static object that occurs before a write to a volatile object in the instruction sequence will occur before that volatile write in the compiled binary.

    • A read of a volatile object (volatile read) has Acquire semantics; a reference to a global or static object that occurs after a read of volatile memory in the instruction sequence will occur after that volatile read in the compiled binary.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • 1
    Thanks for the elaborate answer. There's one thing I'd like to ask, though. As per Annex C "Sequence points" in C99, the end of an expression statement is a sequence point. Combining this with your quote, we get that volatile accesses must occur precisely between other assignments as in the source code, even if these others work on non-volatile objects. But Bo Persson has stated earlier here, "… it can reorder other, non-volatile, assignments to either side of the volatile ones". I've also seen this statement before. Which of the two statements is correct? – Andrey Vihrov Apr 18 '11 at 09:15
  • 1
    Aside from that, it's a good idea that the `memcpy()` call could be theoretically optimized away, since it doesn't operate on volatile data. From this point of view it would be safer to just write an inline memcpy replacement that works on volatile `dst`. This would also save trouble thinking about memory barriers :-) – Andrey Vihrov Apr 18 '11 at 09:17
  • 1
    @Andrey: sorry for the late response. I'm going to edit (or maybe delete) my answer because I think it's too strict a reading of the standard. While I think that a `volatile` object should be a "sequence point barrier" and I believe the standard does say that, the reality is that compilers might not (and I'm not a compiler writer or a language lawyer, though I sometimes play one on SO). See http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html and http://drdobbs.com/high-performance-computing/212701484?pgno=2 – Michael Burr Apr 20 '11 at 14:44
  • @Andrey: so Bo Persson is correct, at least as far as GCC is concerned (and probably other compilers). – Michael Burr Apr 20 '11 at 14:46
  • Your answer explains much and could only be improved by adding the contents of your comment (in fact, the GCC link's second paragraph is nearly about my problem). It looks odd, though, that GCC would do something against the standard. Could the quote from your answer be added only in C99? – Andrey Vihrov Apr 20 '11 at 18:12
4

As far as I can see your reasoning leading to

the compiler would see no trouble in moving the memcpy call

is correct. Your question is not answered by the language definition, and can only be addressed with reference to specific compilers.

Sorry to not have any more-useful information.

mlp
  • 809
  • 7
  • 21
0

Here is a slightly modified example, compiled with gcc 7.2.1 on x86-64:

#include <string.h>
static int temp;
extern volatile int hardware_reg;
int foo (int x)
{
    hardware_reg = 0;
    memcpy(&temp, &x, sizeof(int));
    hardware_reg = 1;
    return temp;
}

gcc knows that the memcpy() is the same as an assignment, and knows that temp is not accessed anywhere else, so temp and the memcpy() disappear completely from the generated code:

foo:
    movl    $0, hardware_reg(%rip)
    movl    %edi, %eax
    movl    $1, hardware_reg(%rip)
    ret
user1998586
  • 762
  • 7
  • 13
  • AFAIU, the compiler shouldn't actually know what memcpy() does, though. Until link time it should be an undefined symbol as a programmer could provide their own memcpy() function. Is this assembly the result of the compilation or of the disassembly of a program linked with some linker-time optimisations enabled? – oromoiluig Jul 25 '22 at 18:09
  • @oromoiluig this is just gcc -O2. It does know what memcpy() does. Paraphrasing the gcc manual: the ISO C spec distinguishes between a "hosted" and a "freestanding" environment, the distinction being that the "hosted" environment has standard functions (such as memcpy) with standard behaviour. gcc defaults to "hosted". If you want gcc to ignore the standards specification for functions such as "memcpy", use the -ffreestanding command line option. If you compile the above with "gcc -O2 -ffreestanding" then indeed it treats memcpy() as an arbitrary function and does not remove temp&memcpy. – user1998586 Apr 10 '23 at 06:44
0

My assumption would be that the compiler never re-orders volatile assignments since it has to assume they must be executed at exactly the position where they occur in the code.

ThiefMaster
  • 310,957
  • 84
  • 592
  • 636
  • 2
    Yes, but it can reorder other, non-volatile, assignments to either side of the volatile ones. – Bo Persson Apr 17 '11 at 12:01
  • That's like reordering the volatile though - so I'd assume any access to a volatile variable is a barrier for re-ordering. – ThiefMaster Apr 17 '11 at 12:02
  • 1
    No, it isn't. The volatile writes must both appear, and in the given order, but that's it from a language perspective. Some compilers promise not to move code across a volatile access, but that is an extension. – Bo Persson Apr 17 '11 at 12:07
0

It's probalby going to get optimized, either because the compiler inlines the mecpy call and eliminates the first assignment, or because it gets compiled to RISC code or machine code and gets optimized there.

Larry Watanabe
  • 10,126
  • 9
  • 43
  • 46
  • 1
    Neither assignment is allowed to be eliminated - that's what 'volatile' gets you. – mlp Apr 20 '11 at 12:19