91

I need a function that (like SecureZeroMemory from the WinAPI) always zeros memory and doesn't get optimized away, even if the compiler thinks the memory is never going to accessed again after that. Seems like a perfect candidate for volatile. But I'm having some problems actually getting this to work with GCC. Here is an example function:

void volatileZeroMemory(volatile void* ptr, unsigned long long size)
{
    volatile unsigned char* bytePtr = (volatile unsigned char*)ptr;

    while (size--)
    {
        *bytePtr++ = 0;
    }
}

Simple enough. But the code that GCC actually generates if you call it varies wildly with the compiler version and the amount of bytes you're actually trying to zero. https://godbolt.org/g/cMaQm2

  • GCC 4.4.7 and 4.5.3 never ignore the volatile.
  • GCC 4.6.4 and 4.7.3 ignore volatile for array sizes 1, 2, and 4.
  • GCC 4.8.1 until 4.9.2 ignore volatile for array sizes 1 and 2.
  • GCC 5.1 until 5.3 ignore volatile for array sizes 1, 2, 4, 8.
  • GCC 6.1 just ignores it for any array size (bonus points for consistency).

Any other compiler I have tested (clang, icc, vc) generates the stores one would expect, with any compiler version and any array size. So at this point I'm wondering, is this a (pretty old and severe?) GCC compiler bug, or is the definition of volatile in the standard that imprecise that this is actually conforming behavior, making it essentially impossible to write a portable "SecureZeroMemory" function?

Edit: Some interesting observations.

#include <cstddef>
#include <cstdint>
#include <cstring>
#include <atomic>

void callMeMaybe(char* buf);

void volatileZeroMemory(volatile void* ptr, std::size_t size)
{
    for (auto bytePtr = static_cast<volatile std::uint8_t*>(ptr); size-- > 0; )
    {
        *bytePtr++ = 0;
    }

    //std::atomic_thread_fence(std::memory_order_release);
}

std::size_t foo()
{
    char arr[8];
    callMeMaybe(arr);
    volatileZeroMemory(arr, sizeof arr);
    return sizeof arr;
}

The possible write from callMeMaybe() will make all GCC versions except 6.1 generate the expected stores. Commenting in the memory fence will also make GCC 6.1 generate the stores, although only in combination with the possible write from callMeMaybe().

Someone has also suggested to flush the caches. Microsoft does not try to flush the cache at all in "SecureZeroMemory". The cache is likely going to be invalidated pretty fast anyway, so this is probably not be a big deal. Also, if another program was trying to probe the data, or if it was going to be written to the page file, it would always be the zeroed version.

There are also some concerns about GCC 6.1 using memset() in the standalone function. The GCC 6.1 compiler on godbolt might a broken build, as GCC 6.1 seems to generate a normal loop (like 5.3 does on godbolt) for the standalone function for some people. (Read comments of zwol's answer.)

cooky451
  • 3,460
  • 1
  • 21
  • 39
  • Are you sure you don't `volatile` the pointer with that declaration? Try `unsigned char * volatile bytePtr`. – Paul Stelian Jul 06 '16 at 18:09
  • 5
    IMHO using `volatile` is a bug unless proven otherwise. But most likely a bug. `volatile` is so underspecified as to be dangerous - just don't use it. – Jesper Juhl Jul 06 '16 at 18:09
  • 1
    [OT] Why not use [`memset`](http://en.cppreference.com/w/cpp/string/byte/memset)? – NathanOliver Jul 06 '16 at 18:09
  • 20
    @JesperJuhl: No, `volatile` is appropriate in this case. – Dietrich Epp Jul 06 '16 at 18:10
  • 9
    @NathanOliver: That won't work, because compilers can optimize out dead stores even if they use `memset`. The problem is that compilers know exactly what `memset` does. – Dietrich Epp Jul 06 '16 at 18:10
  • 1
    @DietrichEpp Oops. I need to read further: *std::memset may be optimized away (under the as-if rules) if the object modified by this function is not accessed again for the rest of its lifetime. For that reason, this function cannot be used to scrub memory* – NathanOliver Jul 06 '16 at 18:12
  • What about calling an external function (from a different compilation unit), passing pointer and size to it, and having that function to only do `memset`? – mvidelgauz Jul 06 '16 at 18:12
  • 9
    @PaulStelian: That would make a `volatile` pointer, we want a pointer to `volatile` (we don't care whether `++` is strict, but whether `*p = 0` is strict). – Dietrich Epp Jul 06 '16 at 18:12
  • 2
    http://www.daemonology.net/blog/2014-09-04-how-to-zero-a-buffer.html The final conclusion here is already added as an answer, use `memset_s`. – leetNightshade Jul 06 '16 at 18:16
  • 7
    @JesperJuhl: There's nothing under-specified about volatile. – GManNickG Jul 06 '16 at 18:22
  • @cooky451 I have a feeling that `SecureZeroMemory` goes beyond what `memset_s` is specified to do. Writing a zero to a memory address is only guaranteed to affect the first level cache. You still have more levels of cache and the main memory to deal with. So I expect that `SecureZeroMemory` contains cache flushing code. And although that may be the intent for `memset_s`, it's not explicitly specified. – user3386109 Jul 06 '16 at 18:40
  • what if you do *bytePtr++ -= *bytePtr; ? – fassl Jul 06 '16 at 19:21
  • 2
    @GManNickG: Would you prefer to say that C lacks directives to provide guarantees about memory ordering which some kinds of program need and which all implementations should be able to uphold without forcing the programmer to jump through hoops? – supercat Jul 06 '16 at 20:01
  • 1
    @mvidelgauz - and then someone has LTO enabled... – TLW Jul 06 '16 at 22:34
  • @fassl That's UB because there's no sequence point between the postincrement and the other side, so anything can happen. Please don't do 'clever' things like that. See: http://stackoverflow.com/questions/4176328/undefined-behavior-and-sequence-points – underscore_d Jul 07 '16 at 00:12
  • @underscore_d ok then just increment it afterwards: *bytePtr -= *bytePtr; bytePtr++; since the compiler can not assume any value for this memory it cannot optimize it out and thats what was asked in this question iirc? – fassl Jul 07 '16 at 01:54
  • 3
    @fassl What's your point? (pretending _yes_ to the seemingly open question of whether a `volatile` pointer can confer `volatile`ity upon a declared-non-`volatile` referent, then) The original code here would still perform volatile writes, which also can't be skipped or reordered, but with the benefit that the code states what it's doing (`= 0`), rather than pointlessly obfuscating itself for no benefit (`self = self - self`, I mean c'mon). Most people reading such a piece of code would conclude that the writer was trying way too hard to look clever - and not succeeding. – underscore_d Jul 07 '16 at 02:15
  • whats your point underscore_d? "always zeros memory and doesn't get optimized away" value = value - value translates into value = 0 except that the compiler cannot assume the value you are setting, at least it didnt when i tried it in the compiler link that was posted in the question – fassl Jul 07 '16 at 03:41
  • The nuclear option: http://stackoverflow.com/a/2220565 – sleep Jul 07 '16 at 06:06
  • 1
    @fassl Let me rephrase that, then. The point here is that the Standard does not specify, or _underspecifies_, whether access to a declared non-`volatile` object through a `volatile` pointer/reference confers volatility on said object during such accesses. If it _did_, the simple `= 0` would work perfectly and doesn't look totally absurd. But this it doesn't _seem_ to be required - or compilers act like it isn't - anything can happen here. So you just found a hack that seems to do it on one particular compiler. That's not portable, so it's not a good idea, and it doesn't mean the code is clever – underscore_d Jul 07 '16 at 09:55
  • 1
    @cooky451 Re your original assumption and what I perceive as a lot of uncertainty about whether or not it was valid, I've opened a `language-lawyer` question on whether `volatile` pointers/reference are supposed to confer `volatile` semantics upon their referents: http://stackoverflow.com/questions/38243501/does-accessing-a-declared-non-volatile-object-through-a-volatile-reference-point All thoughts welcome. Thanks for the inspiration! And the lack of sleep! – underscore_d Jul 07 '16 at 10:58
  • @underscore_d are you telling me a simple subtraction is a hack and not portable? to achieve what was asked in the question one has to read from a volatile value, not just write to a volatile value. it even does work for the case of the second code example with gcc 6.1 which cooky451 posted – fassl Jul 07 '16 at 12:06
  • 2
    @fassl I'll rephrase again. What I'm telling you is it's unnecessary. And it's a "hack" because the Standard doesn't seem to guarantee either piece of code will produce the expected result, thus making your workaround highly context-sensitive and unreliable. Where the Standard _does_ explicitly guarantee `volatile` behaviour, _only a write is needed_; read source doesn't matter. In such cases, the Standard requires `= 0` to write 0 each time. But when the underlying object isn't `volatile`, only your pointer/reference, there's no (strong) guarantee. You're probably just confusing the optimiser – underscore_d Jul 07 '16 at 12:12
  • @underscore_d: If one casts the pointer to `uintptr_t`, xors it with a `volatile`-qualified value which happens to always be zero, and then converts back to a pointer, that should pretty well eliminate the possibility of the compiler figuring out that it can optimize things away, without adding per-loop overhead or introducing potentially thread-unsafe behaviors. – supercat Jul 07 '16 at 23:43
  • 1
    @supercat Sure, but if that's the case, it seems extraordinarily flimsy to rely on code that's in no way guaranteed to prevent optimisation, but which is just currently 'confusing enough' to do so. What if, as is likely, they strengthen optimisation in the next release? Bad times ahead! We'd either have to painstakingly check ASM each time, or never upgrade the compiler, or something in between. As I'm sure is the same for you, I'd vastly prefer that the Standard specifically defined a proper _guaranteed_ way for people to be able to do such things. I just can't recommend hacks that 'work now' – underscore_d Jul 08 '16 at 09:21
  • @underscore_d: If the "always_zero" symbol is exported such that a compiler can't know what references might exist, the only way a compiler could assume that the indicated operation couldn't make the pointer point anywhere in the universe would be if it read the always_zero volatile, checked if it was zero, and then had different code paths for the zero and non-zero cases. While it would be theoretically possible for a compiler to do that, I can't think of any circumstance where that would be more efficient than regarding it as a full memory and causality barrier. – supercat Jul 08 '16 at 15:47
  • @underscore_d: Besides, any compiler that wants to be that obtuse while remaining standards-compliant could just as easily do either of the following: (1) try to allocate an extra 5 petabytes of stack frame for any function whose source text contains the letter "z" and behave arbitrarily if the space is unavailable; (2) as above, but for functions that don't contain the letter "z". Since no program can be expected to work on an obtuse-but-compliant implementation, programmers shouldn't be expected to write code that's proof against all forms of obtuseness. – supercat Jul 08 '16 at 15:51
  • 1
    [Requirements for behavior of pointer-to-volatile pointing to non-volatile object](http://stackoverflow.com/a/28655297/3404097) – philipxy Jul 12 '16 at 19:18
  • ^ I can't express enough how much people need to read [the link directly above this comment](http://stackoverflow.com/a/28655297/3404097) & @philipxy's answer to [my question linked earlier](http://stackoverflow.com/questions/38243501). That recent Defect Report indicates that indeed it looks like C's original intent was that volatile semantics were a property of _lvalues_, not their referred objects (only) - & that such semantics are what the C++ Standard & all extant C compilers have been doing all along! Result! – underscore_d Feb 20 '17 at 20:29

5 Answers5

85

GCC's behavior may be conforming, and even if it isn't, you should not rely on volatile to do what you want in cases like these. The C committee designed volatile for memory-mapped hardware registers and for variables modified during abnormal control flow (e.g. signal handlers and setjmp). Those are the only things it is reliable for. It is not safe to use as a general "don't optimize this out" annotation.

In particular, the standard is unclear on a key point. (I've converted your code to C; there shouldn't be any divergence between C and C++ here. I've also manually done the inlining that would happen before the questionable optimization, to show what the compiler "sees" at that point.)

extern void use_arr(void *, size_t);
void foo(void)
{
    char arr[8];
    use_arr(arr, sizeof arr);

    for (volatile char *p = (volatile char *)arr;
         p < (volatile char *)(arr + 8);
         p++)
      *p = 0;
}

The memory-clearing loop accesses arr through a volatile-qualified lvalue, but arr itself is not declared volatile. It is, therefore, at least arguably allowed for the C compiler to infer that the stores made by the loop are "dead", and delete the loop altogether. There's text in the C Rationale that implies that the committee meant to require those stores to be preserved, but the standard itself does not actually make that requirement, as I read it.

For more discussion of what the standard does or does not require, see Why is a volatile local variable optimised differently from a volatile argument, and why does the optimiser generate a no-op loop from the latter?, Does accessing a declared non-volatile object through a volatile reference/pointer confer volatile rules upon said accesses?, and GCC bug 71793.

For more on what the committee thought volatile was for, search the C99 Rationale for the word "volatile". John Regehr's paper "Volatiles are Miscompiled" illustrates in detail how programmer expectations for volatile may not be satisfied by production compilers. The LLVM team's series of essays "What Every C Programmer Should Know About Undefined Behavior" does not touch specifically on volatile but will help you understand how and why modern C compilers are not "portable assemblers".


To the practical question of how to implement a function that does what you wanted volatileZeroMemory to do: Regardless of what the standard requires or was meant to require, it would be wisest to assume that you can't use volatile for this. There is an alternative that can be relied on to work, because it would break far too much other stuff if it didn't work:

extern void memory_optimization_fence(void *ptr, size_t size);
inline void
explicit_bzero(void *ptr, size_t size)
{
   memset(ptr, 0, size);
   memory_optimization_fence(ptr, size);
}

/* in a separate source file */
void memory_optimization_fence(void *unused1, size_t unused2) {}

However, you must make absolutely sure that memory_optimization_fence is not inlined under any circumstances. It must be in its own source file and it must not be subjected to link-time optimization.

There are other options, relying on compiler extensions, that may be usable under some circumstances and can generate tighter code (one of them appeared in a previous edition of this answer), but none are universal.

(I recommend calling the function explicit_bzero, because it is available under that name in more than one C library. There are at least four other contenders for the name, but each has been adopted only by a single C library.)

You should also know that, even if you can get this to work, it may not be enough. In particular, consider

struct aes_expanded_key { __uint128_t rndk[16]; };

void encrypt(const char *key, const char *iv,
             const char *in, char *out, size_t size)
{
    aes_expanded_key ek;
    expand_key(key, ek);
    encrypt_with_ek(ek, iv, in, out, size);
    explicit_bzero(&ek, sizeof ek);
}

Assuming hardware with AES acceleration instructions, if expand_key and encrypt_with_ek are inline, the compiler may be able to keep ek entirely in the vector register file -- until the call to explicit_bzero, which forces it to copy the sensitive data onto the stack just to erase it, and, worse, doesn't do a darn thing about the keys that are still sitting in the vector registers!

zwol
  • 135,547
  • 38
  • 252
  • 361
  • 6
    That's interesting... I'd be interested in seeing a reference to the committee's comments. – Dietrich Epp Jul 06 '16 at 18:28
  • 10
    How does this square with 6.7.3(7)'s definition of `volatile` as _[...] Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. **Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine**, except as modified by the unknown factors mentioned previously. What constitutes an access to an object that has volatile-qualified type is implementation-defined._ ? – Iwillnotexist Idonotexist Jul 06 '16 at 18:40
  • 15
    @IwillnotexistIdonotexist The key word in that passage is _object_. `volatile sig_atomic_t flag;` is a volatile _object_. `*(volatile char *)foo` is merely an _access through a volatile-qualified lvalue_ and the standard does not require that to have any special effects. – zwol Jul 06 '16 at 18:49
  • 2
    What about GCC using memset in the "raw" function? Memset will likely use 16, 8 or 4 byte stores, doesn't that have the potential to mess with memory-mapped registers? – cooky451 Jul 06 '16 at 18:50
  • @cooky451 Yes. Hmm, that might actually genuinely be a bug. – zwol Jul 06 '16 at 18:56
  • @cooky451 My copies of GCC 5.4 and 6.1 do not compile your `volatileZeroMemory`, in isolation, to call `memset`. I'm not sure what's going on with godbolt. – zwol Jul 06 '16 at 19:01
  • @zwol Hm, I don't have access to GCC 6.1 outside of godbolt sadly. Is your GCC 6.1 generating the same code as 5.3 on godbolt? – cooky451 Jul 06 '16 at 19:20
  • @cooky451 Looks the same to me, yes. Anyhow it is definitely a loop writing one byte at a time in forward order, and not a call to any library function. – zwol Jul 06 '16 at 19:24
  • 2
    Your use of VLAIS (VLA in struct) is a very bad idea because it is actively not supported on non-gcc compilers that otherwise implement gcc extensions. Instead of `"m"` constraint with a complex type, just use `"r"` constraint with the address of the memory, and a `"memory"` clobber. Then the compiler cannot make any assumptions about what the asm does with the pointed-to object (or any other reachable memory) and thus cannot optimize it out. – R.. GitHub STOP HELPING ICE Jul 06 '16 at 19:54
  • 2
    @R.. The answer already points out that VLAIS only works with GCC and only in C. As for your alternative, we have discussed that more than once already, but I might not have said this sufficiently baldly: I do not think either gcc or "non-gcc compilers that otherwise implement gcc extensions" can be counted on to interpret it the way you think it should be interpreted. – zwol Jul 06 '16 at 20:00
  • 1
    @zwol: They can be counted on to interpret it as that, and they do. The compiler must assume asm can read any memory whose address is available to it in any way (this includes any address that's escaped the compiler's ability to analyze the extent of its visibility), and if the asm block has a `"memory"` clobber, that the asm might also modify any such address. – R.. GitHub STOP HELPING ICE Jul 06 '16 at 21:34
  • @zwol I edited some experiments into the question, might be interesting to you. For example, does your GCC 6.1 also remove the stores if you remove the memory fence? – cooky451 Jul 06 '16 at 21:46
  • 3
    The Standard says what criteria something must meet to be a "compliant" implementation. It makes no effort to describe what criteria an implementation on a given platform must meet to be a "good" implementation or a "usable" one. GCC's treatment of `volatile` may be sufficient to make it a "compliant" implementation, but that doesn't mean it's sufficient to be "good" or "useful". For many kinds of systems programming it should be regarded as woefully deficient in those regards. – supercat Jul 06 '16 at 22:27
  • @cooky451 Please ask a new question about memory fences, etc. – zwol Jul 06 '16 at 23:32
  • @R.. I believe you are wrong about that, but I'm not interested in discussing it any further here. If I ever get around to bringing this back up on the gcc and/or clang mailing lists, that would be a much more appropriate venue. – zwol Jul 06 '16 at 23:32
  • 1
    @DietrichEpp Sorry for the delay. The clearest statements I'm aware of will be found by searching the C99 Rationale for "volatile". (Interestingly, there is one sentence in there that implies that the committee _meant_ for `*(volatile T*)x` to force that one access to `x` to be treated as volatile; but the actual wording of the standard does not achieve this.) (I regret to say I do not know where to find an online+freely available copy of the Rationale.) – zwol Jul 06 '16 at 23:38
  • @zwol N1381, at least by omitting the condition, seems to claim it's defined that using a `volatile` pointer to alter non-volatile data should respect the qualifier of the pointer, regardless of that of the data, and that compilers not doing this "violate the standard by not always respecting the `volatile`" qualifier. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1381.pdf Either this doc is wrong or haphazardly worded, or your answer is on the wrong track - albeit understandably as you're presuming a mirror of the same 'original declaration' rule we use for `const_cast`. What do you think? – underscore_d Jul 06 '16 at 23:52
  • 1
    From n1548 §6.7.3 ¶6 the standard uses the phrase "object defined with a volatile-qualified type" to distinguish it from "lvalue with volatile-qualified type". It's unfortunate that this "object defined with" versus "lvalue" distinction does not carry forward, and the standard then uses "object that has volatile-qualified type", and says that "what constitutes access to an object that has volatile-qualified type is implementation-defined" (which could have said "lvalue" or "object defined with" for clarity). Oh well. – Dietrich Epp Jul 07 '16 at 00:00
  • 1
    @underscore_d You may be right; I'm afraid I haven't enough brain right now to parse standardese carefully enough to tell. I suggest you write up a new [language-lawyer] question specifically about this and hopefully some other people will chime in. (It may also be the case that the rules changed in C11 or post-C11 drafts; I don't recognize the N-numbers you're quoting, and I have never close-read C11 to the extent that I did C99 back when C compilers were my actual job.) – zwol Jul 07 '16 at 01:07
  • @zwol You and me both! Brain cannot live on C++ alone (sleep would help). The thought of opening [another](http://stackoverflow.com/questions/38235112/why-is-a-volatile-local-variable-optimised-differently-from-a-volatile-argument) question fills me with horror at present but is definitely a worthwhile enterprise for another day. Of course, let me know if you have any revelations before then! – underscore_d Jul 07 '16 at 01:14
  • Excellent, especially the note about register usage at the end! – Mark K Cowan Jul 07 '16 at 10:54
  • 1
    @zwol Here's the suggested `language-lawyer` question: http://stackoverflow.com/questions/38243501/does-accessing-a-declared-non-volatile-object-through-a-volatile-reference-point All thoughts welcome. – underscore_d Jul 07 '16 at 10:55
  • 1
    @underscore_d I have edited my answer to express uncertainty about whether GCC is conformant, and add links to your followup(s). Thanks for writing them up. – zwol Jul 07 '16 at 14:57
  • @zwol You're welcome, and thanks for the great edit and linking back! However, I'm not sure the GCC threads are relevant, as that just seems to be an corner-case bug in the optimiser, not contingent on the uncertain intention of the Standard. But if you still want to include them somewhere, that's very kind! – underscore_d Jul 07 '16 at 15:12
  • C vs. C++: [godbolt's gcc6.1 will actually generate the stores](https://godbolt.org/g/NHfMYY) to zero `arr[]` if you use `-xc` and declare it as `volatile char arr[4];`. If either of those are omitted, the object never exists at all (which seems valid in this case where no reference to it escapes the function, and the function never stores anything there). – Peter Cordes Jul 07 '16 at 15:19
  • 3
    The C spec also rather directly says *"An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (**including any caused by calling a function or accessing a volatile object**)."* (emphasize mine). – Johannes Schaub - litb Jul 07 '16 at 17:10
  • This would allow even optimizing away a read from a global `volatile` if the compiler "can deduce that its value is not used"? – Johannes Schaub - litb Jul 07 '16 at 17:11
  • 1
    @JohannesSchaub-litb "and that no needed side effects are produced" is probably the key part there. Volatile reads are *a* side effect, so if you have `volatile char* x = 0x40; *x;`, the compiler cannot deduce that the side effect isn't needed, and has to preserve it. But if it can deduce that the side effect isn't needed, it can probably eliminate it. – mbrig Jul 07 '16 at 19:27
  • 2
    @zwol: The Standard very makes quite clear that standard-compliance does not imply quality, and that it's entirely possible for an implementation to be standard-compliant and useless. I would suggest that saying that the effects of volatile access are "implementation-defined" would mean that a compiler which documented that the qualifier had no effect whatsoever would fall into the "compliant but useless" category. The people at gcc should be less interested in what is mandated by the Standard and more interested in what is necessary to make something useful. – supercat Jul 07 '16 at 20:14
  • 1
    @supercat I don't hack on GCC anymore, and I write these answers from the pragmatic perspective that, even if the GCC devs make everything _perfect_ in the next release, we'll still be working around the older versions for years to come. Your regular complaints about it would be better directed at the GCC mailing lists. Or you might consider teaming up with John Regehr on his "Friendly C" project. – zwol Jul 07 '16 at 20:25
  • @zwol: It has long been apparent that the authors of gcc are interested in processing programs that could be efficiently handled without using features that can be cheaply supported on only 99.9% of the platforms upon which C implementations exist, rather than in efficiently handling programs that could benefit from such features. I am still puzzled, however, by the mentality that seems to think there's some confusion about what a programmer might mean when writing code that stores to a volatile-qualified pointer. On all the platforms I've seen, the answer would be pretty obvious. – supercat Jul 07 '16 at 23:33
  • 1
    @supercat What part of "Your regular complaints about [this] would be better directed at the GCC mailing lists" was unclear? – zwol Jul 07 '16 at 23:35
  • @zwol: My point is that I don't think there's anything I could tell the authors of gcc that they don't already know. – supercat Jul 07 '16 at 23:38
  • 2
    @supercat OK, but you're not telling _me_ anything I don't already know either, and there's absolutely nothing _I_ can do about it. So unless you have actual suggestions for changes to make to the answer, can we please drop it? – zwol Jul 08 '16 at 00:22
  • 1
    In C++ (no idea about how status is for C++17 which has cleanup quite a bit), it is also unclear whether a write with a differently qualified lvalue than the current object type is considered to "reuse" that object memory and create a new object with volatile qualified type. For const qualifier, this issue doesn't exist because you can't write with it, but for volatile the question arises. – Johannes Schaub - litb Jul 08 '16 at 11:06
  • 1
    If you want to be certain there, you can use placement new to force a volatile object at a particular location. For C, I believe rules are clearer because of its "effective types" where objects don't really posess types themselfs, but only when they are accesses by lvalues are thes considered to have an "effective type" for access purposes. For C, when you first do a volatile write, will it not change the effective type to be volatile qualified and therefore the write would count to be observable? – Johannes Schaub - litb Jul 08 '16 at 11:10
  • @JohannesSchaub-litb Interesting questions! As indicated by my thread about this, I'd like to think your last point would be true, whether or not it results from that exact wording (ideally not, since as you indicated, said wording would conflict with C++'s abstract notion of whether a specific object's lifetime has begun in a piece of memory). But the C Standard itself doesn't ever address that scenario, far as I can see. – underscore_d Jul 08 '16 at 11:28
  • @JohannesSchaub-litb: From the point of view of the Standard, using a non-qualified pointer to access a `volatile`-qualified object is UB. Except when the compiler writer knows all of the useful semantics supported by the execution platform, I see no disadvantage to a compiler regarding as defined behavior situations in which the last write prior to an object becoming observable outside the compiler's control is volatile, and likewise the first read after an an object changes outside the compiler's control, but the Standard leaves it undefined to allow for those few implementations... – supercat Jul 08 '16 at 15:56
  • ...where the compiler and execution platform are under the control of the same entity, on the presumption that implementations for platforms with additional useful semantics will behave in a way that makes sense on those platforms. – supercat Jul 08 '16 at 15:57
  • The standard has nothing to say about object code. **"Volatile access"** although observable behaviour **is implementation defined**. So if people want things manifested in object code **they must use an implementation that says that the volatile behaviour they want is manifested as they want in object code**. This is chronically forgotten in Qs & As re volatile accesses & observable behaviour. Answers should make this clear. Volatile is "reliable for" nothing except per implementation documentation. – philipxy Jun 01 '20 at 05:05
15

I need a function that (like SecureZeroMemory from the WinAPI) always zeros memory and doesn't get optimized away,

This is what the standard function memset_s is for.


As to whether this behavior with volatile is conforming or not, that's a bit hard to say, and volatile has been said to have long been plagued with bugs.

One issue is that the specs say that "Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine." But that only refers to 'volatile objects', not accessing a non-volatile object via a pointer that has had volatile added. So apparently if a compiler can tell that you're not really accessing a volatile object then it's not required to treat the object as volatile after all.

bames53
  • 86,085
  • 15
  • 179
  • 244
  • 4
    Note: This is part of the C11 standard, and is not available in all toolchains yet. – Dietrich Epp Jul 06 '16 at 18:15
  • 5
    One should note that interestingly, this function is standardized for C11 but not for C++11, C++14 or C++17. So technically it's not a solution for C++, but I agree that this seems like the best option from a practical perspective. At this point I do wonder though if the behavior from GCC is conforming or not. Edit: Actually, VS 2015 doesn't have memset_s, so it's not all that portable yet. – cooky451 Jul 06 '16 at 18:16
  • 2
    @cooky451 I thought [C++17 pulls the C11 standard library in by reference](http://stackoverflow.com/a/38060437/3484570) (see second Misc). – nwp Jul 06 '16 at 18:23
  • 1
    As far as I know, *only* recent versions of OSX provide `memset_s`. There are about five different competing names for this function; the only one that is provided by more than one C library is `explicit_bzero`, which is in _both_ FreeBSD and OpenBSD, and possibly also musl libc. And I'm trying to get it into glibc, in my copious free time. – zwol Jul 06 '16 at 18:24
  • 14
    Also, describing `memset_s` as C11-standard is an overstatement. It is part of Annex K, which is optional in C11 (and therefore also optional in C++). Basically all implementors, _including_ Microsoft, whose idea it was in the first place (!), have declined to pick it up; last I heard they were talking about scrapping it in C-next. – zwol Jul 06 '16 at 18:26
  • @zwol Why Microsoft wouldn't pick it up considering that they'd just have to call SecureZeroMemory is beyond me. – cooky451 Jul 06 '16 at 18:30
  • @cooky451 So write your code with `SecureZeroMemory()` instead of `memset_s()`. Hmm, maybe that makes code less portable away from MS. smells like [Embrace, extend and extinguish](https://en.wikipedia.org/wiki/Embrace,_extend_and_extinguish) – chux - Reinstate Monica Jul 06 '16 at 18:50
  • 8
    @cooky451 In certain circles, Microsoft is notorious for forcing stuff into the C standard over basically everyone else's objections and then not bothering to implement it themselves. (The most egregious example of this is C99's relaxation of the rules for what the underlying type of `size_t` is allowed to be. The Win64 ABI is not conformant with C90. That would have been ... not _ok_, but not terrible ... if MSVC had actually picked up C99 things like `uintmax_t` and `%zu` in a timely fashion, but they _didn't_.) – zwol Jul 06 '16 at 18:52
  • @nwp - C++17 refers to C11, but at the same time excludes some of the C headers and the *_s functions in Annex K. See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0063r2.html – Bo Persson Jul 06 '16 at 19:12
  • 1
    The paper that introduced `memset_s()`, N1381, seems to imply - at least by omitting to mention any requirement about the original declaration - that it was already defined that using a `volatile` pointer to alter non-volatile data should respect the qualifier of the pointer, regardless of that of the data, and that compilers not doing this "violate the standard by not always respecting the `volatile`" qualifier. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1381.pdf Does that make the accepted answer incorrect? – underscore_d Jul 06 '16 at 23:53
  • The authors of the C89, where `volatile` was introduced, recognize in the Rationale that it is possible for an implementation to be of such poor quality as to be unsuitable for any purpose, but this shouldn't be a problem if people make a bona fide effort to produce useful implementations. If a particular behavior would be useful when writing some kind of code for a particular platform, someone seeking to write a quality implementation suitable for processing such code on that platform should support it if practical regardless of whether the Standard would require it. – supercat Jul 18 '18 at 21:23
  • I find it really sad that compiler writers have completely failed to recognize that when the Standard fails to mandate something, that means failure to do it will not render an implementation non-conforming. It says nothing about whether such failure would make the implementation less suitable for various purposes than it otherwise would be. – supercat Jul 18 '18 at 21:37
2

I offer this version as portable C++ (although the semantics are subtly different):

void volatileZeroMemory(volatile void* const ptr, unsigned long long size)
{
    volatile unsigned char* bytePtr = new (ptr) volatile unsigned char[size];

    while (size--)
    {
        *bytePtr++ = 0;
    }
}

Now you have write accesses to a volatile object, not merely accesses to a non-volatile object made through a volatile view of the object.

The semantic difference is that it now formally ends the lifetime of whatever object(s) occupied the memory region, because the memory has been reused. So access to the object after zeroing its contents is now surely undefined behavior (formerly it would have been undefined behavior in most cases, but some exceptions surely existed).

To use this zeroing during an object's lifetime instead of at the end, the caller should use placement new to put a new instance of the original type back again.

The code can be made shorter (albeit less clear) by using value initialization:

void volatileZeroMemory(volatile void* const ptr, unsigned long long size)
{
    new (ptr) volatile unsigned char[size] ();
}

and at this point it is a one-liner and barely warrants a helper function at all.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • 2
    If accesses to the object after the function executes would invoke UB, that would mean that such accesses could yield the values the object held before it was "cleared". How is that not the opposite of security? – supercat Jul 18 '18 at 21:53
0

It should be possible to write a portable version of the function by using a volatile object on the right-hand side and forcing the compiler to preserve the stores to the array.

void volatileZeroMemory(void* ptr, unsigned long long size)
{
    volatile unsigned char zero = 0;
    unsigned char* bytePtr = static_cast<unsigned char*>(ptr);

    while (size--)
    {
        *bytePtr++ = zero;
    }

    zero = static_cast<unsigned char*>(ptr)[zero];
}

The zero object is declared volatile that ensures the compiler can make no assumptions about its value even though it always evaluates as zero.

The final assignment expression reads from a volatile index in the array and stores the value in a volatile object. Since this read cannot be optimized, it ensures that the compiler must generate the stores specified in the loop.

D Krueger
  • 2,446
  • 15
  • 12
  • 1
    This doesn't work at all... just look at the code that is being generated. – cooky451 Jul 06 '16 at 20:41
  • @cooky451 It looks like it generates the accesses to me. Which version isn't working? – D Krueger Jul 06 '16 at 20:58
  • 1
    Having read my generated ASM mo' better, it seems to inline the function call and retain the looping, but not do any storing to `*ptr` during that loop, or actually _anything at all_... just looping. wtf, there goes my brain. – underscore_d Jul 06 '16 at 21:20
  • 3
    @underscore_d It's because it's optimizing away the store while preserving the read of the volatile. – D Krueger Jul 06 '16 at 21:25
  • 1
    Yeah, and it dumps the result to an unchanging `edx`: I get this: `.L16: subq $1, %rax; movzbl -1(%rsp), %edx; jne .L16` – underscore_d Jul 06 '16 at 21:29
  • 1
    If I change the function to allow passing an arbitrary `volatile unsigned char const` fill byte... _it doesn't even read it_. The generated inlined call to `volatileFill()` is just `[load RAX with sizeof] .L9: subq $1, %rax; jne .L9`. Why does the optimiser (A) not re-read the fill byte and (B) bother preserving the loop where it doesn't do anything? – underscore_d Jul 06 '16 at 21:34
  • ^ that only happens if passing the argument by value, which of course can't possibly be affected by any other code. Passing by reference produces the same effective result as declaring in-body. But then you realise my 1st sentence here applies equally to a locally declared variable in the function, so why isn't that trashed too? I feel a question coming on... – underscore_d Jul 06 '16 at 22:03
  • 1
    @underscore_d I believe that's because volatile is ignored for parameters. – D Krueger Jul 06 '16 at 22:23
  • @DKrueger I'm not finding any source for that online, and searching N3797 is... tricky. Could you point me at the relevant clause of the Standard? This answer mentions that it's ignored for _overloading_ - http://stackoverflow.com/a/10242660/2757035 - but then so is `const`, and yet that still affects behaviour... or rather compilation. Even if you're right, why does the optimiser not then elide the loop, since it's now a no-op? I think I'll still ask a question about that as it's totally perplexing. – underscore_d Jul 06 '16 at 22:37
  • @underscore_d It's discussed here: http://stackoverflow.com/questions/3303660/volatile-variables-as-argument-to-function but no chapter and verse – D Krueger Jul 06 '16 at 22:45
  • @DKrueger I found that and don't see how it's relevant. It discusses _pointers to_ `volatile`, not by-value arguments declared `volatile`. Which are, yeah, pretty pointless, but I would presume would behave the same as 'real' local variables declared `volatile`. And yet they don't. I'm aware `volatile` is an implementation-defined mess, so the question is probably really for GCC, not the Standard. But there's clearly _something_ odd going on. – underscore_d Jul 06 '16 at 22:50
  • Interesting edit - FWIW, works for me, on my current compiler, with the current wind direction and alignment of celestial bodies. The logic _seems_ sound. One might assume that popular implementations of 'volatile fill' do something much like this, at least within the safe confines of their own implementation-defined behaviour. – underscore_d Jul 06 '16 at 23:27
  • 1
    @underscore_d: Would not `volatile` qualifiers on received parameters be meaningful and important in situations where code modifies the values stored therein between a `setjmp` and the corresponding `longjmp`, and then accesses those values following the `longjmp`? – supercat Jul 07 '16 at 23:36
  • @supercat I originally couldn't imagine much point to `volatile` locals, but then I thought about inline ASM making anything possible (in a very implementation & ABI-dependent way, but any time we use `volatile` we're already there) - though surely in most cases it'd be possible/better to allocate pre-call & pass a `volatile` ref, documenting intended sharing. But good point about `jmp`s. So there are definitely hypothetical use-cases for `volatile` qualifying local (incl `static`) variables. Hard for me to imagine in practical use though. Could you recommend good examples using such methods? – underscore_d Jul 08 '16 at 09:17
  • @underscore_d: A function which uses `setjmp` for the purposes of giving nested functions an "escape path", and then calls such functions within a loop, would need to declare the loop index `volatile` if it would need to find out the value of the index when the called function exited via `longjmp`. Absent a `volatile` qualifier, a compiler would likely generate code for e.g. `for (int i=0; i<10; i++) doSomething(i);` that would keep `i` in a caller-saved register whose value would get lost if `getSomething` doesn't return normally. – supercat Mar 25 '19 at 18:27
0

As explained in other answers, volatile was meant to ensure that accesses that have side effects are always generated, even if the access is considered redundant by optimization. Accessing memory-mapped peripherals is a typical use case. The usual solution to define memory-mapped registers looks like this:

#define PORT_A *(volatile short*)0x1234

Of course it can be defined as a structure pointer, describing all registers of a peripheral, and the address might come from some configuration structure, possibly populated runtime. The point is that the compiler must ALWAYS generate the volatile access, regardless of what memory area is accessed. The compiler cannot possibly speculate about what's at that address. Another typical solution is that reading a status register also clears all set status bits. If you wish to just clear the status, you make a dummy (volatile) read, and discard the result. The access must not, under any circumstances, be optimized out by any compiler. (Such dummy reads are also crucial to synchronize delayed transactions on slower peripheral buses) Another solution, used by the IAR (and some other) compilers, is creating volatile data structures in the standard way, and placing the structure at the fixed address of the peripheral with the non-standard @ directive. That works, too, the compiler never optimizes out peripheral accesses. That would be a disaster, bare-metal MCU programming would not work as it does.

My guess is that the array to be zeroed was not defined as volatile, and it was entirely eliminated. Casting its address to a volatile pointer does not preserve the array itself. It's an interesting glitch, because as a consequence the "sacred" volatile access is eliminated, too.

vjalle
  • 549
  • 4
  • 13