14

The statement:

volatile unsigned char * volatile p = (volatile unsigned char * volatile)v;

Generates a warning C4197 in MSVC v14.1:

Warning C4197: 'volatile unsigned char *volatile ': top-level volatile in cast is ignored

The 2011 C standard (section [N1570] 6.7.3 4.) states: “The properties associated with qualified types are meaningful only for expressions, that are l-values”, thus the top-level volatile in this cast is ignored and generates this warning.

The author of this code states, that it doesn't violate the C standard and is required to prevent some GCC optimizations. He illustrates the problem with the code at: https://godbolt.org/g/xP4eGz

#include <stddef.h>

static void memset_s(void * v, size_t n) {
  volatile unsigned char * p = (volatile unsigned char *)v;
  for(size_t i = 0; i < n; ++i) {
    p[i] = 0;
  }
}

void f1() {
  unsigned char x[4];
  memset_s(x, sizeof x);
}

static void memset_s_volatile_pnt(void * v, size_t n) {
  volatile unsigned char * volatile p = (volatile unsigned char * volatile)v;
  for(size_t i = 0; i < n; ++i) {
    p[i] = 0;
  }
}

void f1_volatile_pnt() {
  unsigned char x[4];
  memset_s_volatile_pnt(x, sizeof x);
}

...where he shows that the function f1() compiles to nothing (just a ret instruction) but f1_volatile_pnt() compiles to instructions that do the intended job.

QUESTION: Is there a way to properly write this code so that it is compiled correctly by GCC and in accordance with the 2011 C standard (section [N1570] 6.7.3 4.) so it does not generate the warning with MSVC and ICC ? ...without #ifdef...

For the context of this problem, see: https://github.com/jedisct1/libsodium/issues/687

George Robinson
  • 1,500
  • 9
  • 21
  • 10
    It looks like the `volatile` that prevents optimizations and the `volatile` that is being ignored are different ones. `volatile unsigned char * volatile p = (volatile unsigned char *)v;` should both silence the warning and prevent optimizations. – nwp Mar 03 '18 at 13:15
  • 1
    I am curious what the intended job is that `f1_volatile_pnt` does and `f1` doesn't do. There are surely better ways to implement a sleep and setting the memory does nothing. – nwp Mar 03 '18 at 13:17
  • 1
    @nwp In the case of `memset`, that's true. However, consider a function which uses `x` as a work buffer (like `f_mkfs` in ChaN's FatFS). By the same logic that function is also subject to elimination. I think the issue here is that GCC understands exactly what the `memset` call is going to do, so it knows that it can eliminate it. –  Mar 03 '18 at 13:20
  • @MarkYisri Why would [`f_mkfs`](http://elm-chan.org/fsw/ff/doc/mkfs.html) be eliminated? It works on a persistent buffer or a storage device, not a local variable that disappears anyways. – nwp Mar 03 '18 at 14:19
  • 2
    @nwp Since `V` is a type of `void` why is there a cast needed? isn't that type implicit converted? – Michi Mar 03 '18 at 14:25
  • 1
    @Michi ["A pointer to an unqualified type may be implicitly converted to the pointer to qualified version of that type (in other words, const, volatile, and restrict qualifiers can be added."](http://en.cppreference.com/w/c/language/conversion#Pointer_conversions), so yes, the cast is unnecessary and should be removed. Chances are that the code is actually compiled with a C++ compiler though where the cast is required. – nwp Mar 03 '18 at 14:30
  • @nwp That was my Point too. Maybe OPs compiler does request that CAST. – Michi Mar 03 '18 at 14:33
  • What does **"_R-value_"** mean in the title? – Nat Mar 03 '18 at 18:53
  • 1
    @ Nat see: https://en.wikipedia.org/wiki/Value_(computer_science)#R-values_and_addresses – George Robinson Mar 03 '18 at 19:57
  • 1
    @Nat: The terms “lvalue” and “rvalue” originate in discussing the left side and the right side of an assignment statement, such as `x = 3*4`. The right side is, after everything is evaluated, simply a value, such as 12, or the character T, the floating-point number 3.5, or the contents of a structure. The left side is an object. The value on the right is stored in the object. So an lvalue designates an object. An rvalue is only a value, with no object attached. You might have an lvalue in an expression, such as `x` in `3*x`, but, while evaluating the expression, it is replaced by its value. – Eric Postpischil Mar 04 '18 at 01:12
  • 1
    @Nat: When you have those down, C++ has [rvalues, lvalues, xvalues, glvalues, and prvalues](https://stackoverflow.com/q/3601602/298225). (And a note about the above: I wrote that an rvalue is only a value, with no object attached. Technically, lvalues are a subset of rvalues; it is during the process of evaluating an expression for assignment that we replace the lvalue with the value and end up with just the value for assignment. Don’t take my comment-size introduction as a strict specification.) – Eric Postpischil Mar 04 '18 at 01:14

2 Answers2

12

Conclusion

To make the code volatile unsigned char * volatile p = (volatile unsigned char * volatile) v; compile in C or in C++ without warnings while retaining the author’s intent, remove the second volatile in the cast:

volatile unsigned char * volatile p = (volatile unsigned char *) v;

The cast is unnecessary in C, but the question asks that the code be compilable without warning in MSVC, which compiles as C++, not C, so the cast is needed. In C alone, if the statement could be (assuming v is void * or is compatible with the type of p):

volatile unsigned char * volatile p = v;

Why Qualify a Pointer as Volatile

The original source contains this code:

volatile unsigned char *volatile pnt_ =
    (volatile unsigned char *volatile) pnt;
size_t i = (size_t) 0U;

while (i < len) {
    pnt_[i++] = 0U;

The apparent desire of this code is to ensure that memory is cleared for security purposes. Normally, if C code assigns zero to some object x and never reads x before a subsequent assignment or program termination, the compiler will, when optimizing, remove the assignment of zero. The author does not want this optimization to occur; they apparently intend to ensure that memory is actually cleared. Clearing memory can reduce opportunities for an attacker to read the memory (through side channels, by exploiting bugs, by gaining physical possession of the computer, or other means).

Suppose we have some buffer x that is an array of unsigned char. If x were defined with volatile, it is a volatile object, and the compiler always implements writes to it; it never removes them during optimization.

On the other hand, if x is not defined with volatile, but we put its address in a pointer p that has a type pointer to volatile unsigned char, what happens when we write *p = 0? As R.. points out, if the compiler can see that p points into x, it knows that the object being modified is not volatile, and therefore the compiler is not required to actually write to memory if it can otherwise optimize away the assignment. This is because the C standard defines volatile in terms of accessing volatile objects, not merely accessing memory through a pointer that has a type of “pointer to volatile something.”

To ensure the compiler actually writes to x, the author of this code declares p to be volatile. What this means is that, in *p = 0, the compiler cannot know that p points into x. The compiler is required to load the value of p from whatever memory it has assigned for p; it must assume p may have changed from the value that pointed into x.

Further, when p is declared volatile unsigned char *volatile p, the compiler must assume that the place pointed to by p is volatile. (Technically, when it loads the value of p, it could examine it, discover it is in fact pointing into x or some other memory known not to be volatile, and then treat it as non-volatile. But this would be an extraordinary effort by the compiler, and we can assume it does not happen.)

Therefore, if the code were:

volatile unsigned char *pnt_ = pnt;
size_t i = (size_t) 0U;

while (i < len) {
    pnt_[i++] = 0U;

then, whenever the compiler can see that pnt in fact points to non-volatile memory and that memory is not read before it is later written, the compiler may remove this code during optimization. However, if the code is:

volatile unsigned char *volatile pnt_ = pnt;
size_t i = (size_t) 0U;

while (i < len) {
    pnt_[i++] = 0U;

then, in each iteration of the loop, the compiler must:

  • Load pnt_ from the memory allocated for it.
  • Calculate the destination address.
  • Write zero to that address (unless the compiler goes to the extraordinary trouble of determining the address is non-volatile).

Thus, the purpose of the second volatile is to hide from the compiler the fact that the pointer points to non-volatile memory.

Although this accomplishes the author’s goal, it has the undesired effects of forcing the compiler to reload the pointer in each iteration of the loop and preventing the compiler from optimizing the loop by writing to the destination several bytes at a time.

Because of this undesired effect, I have another answer to the question “Is there a way to properly write this code so that it is compiled correctly by GCC and in accordance with the 2011 C standard (section [N1570] 6.7.3 4.) so it does not generate the warning with MSVC and ICC ?”: I would implement memset_s in assembly language and ask the compiler developers also to build it into the compiler.

Casting a Value

Consider the definition:

volatile unsigned char * volatile p = (volatile unsigned char * volatile) v;

We have seen above that the definition of p as volatile unsigned char * volatile is needed to accomplish the author’s goal, although it is an unfortunate workaround to shortcomings in C. However, what about the cast, (volatile unsigned char * volatile).

First, the cast is unnecessary, as the value of v will be automatically converted to the type of p. To avoid the warning in MSVC, the cast can simply be removed, leaving the definition as volatile unsigned char * volatile p = v;.

Given that the cast is there, the question asks whether the second volatile has any meaning. The C standard explicitly says “The properties associated with qualified types are meaningful only for expressions that are lvalues.” (C 2011 [N1570] 6.7.3 4.)

volatile means something unknown to the compiler can change the value of an object or accessing the object can cause something unknown to the compiler to occur. For example, if there is a volatile int a in the program, that implies the object identified by a can be changed by some means not known to the compiler. It could be changed by some special hardware on the computer, by a debugger, by the operating system, or by other means.

volatile qualifies an object. An object is a region of data storage in memory that can represent values.

In expressions, we have values. For example, some int values are 3, 5, or −1. Values cannot be volatile. They are not storage in memory; they are abstract mathematical values. The number 3 can never change; it is always 3.

The cast (volatile unsigned char * volatile) says to cast something to be a volatile pointer to a volatile unsigned char. It is fine to point to a volatile unsigned char—a pointer points to something in memory. But what does it mean to be a volatile pointer? A pointer is just a value; it is an address. Values do not have memory, they are not objects, so they cannot be volatile. So the second volatile in the cast (volatile unsigned char * volatile) has no effect in standard C. It is conforming C code, but the qualifier has no effect.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Would this mean that MSVC is incapable of clearing memory for security purposes, because it ignores the second "volatile" as evidenced by its C4197 warning ? What about adherence to the C standard (section 6.5.3) that the documentation of this warning refers to ? – George Robinson Mar 03 '18 at 19:11
  • 2
    @KarolaN: The cast is not needed. The cast is legal (conforming) C. The second `volatile` in the cast has no effect. The second `volatile` in the definition of the pointer is needed for the author’s purpose. I expect the MSVC warning about the fact that the second `volatile` in the cast has no effect will not prevent MSVC from correctly clearing memory. While the second `volatile` in the cast has no effect, the second `volatile` in the definition of the pointer does have an effect, and MSVC should respect that. (However, note you asked about C, and I believe MSVC only does C++, not true C.) – Eric Postpischil Mar 03 '18 at 19:23
  • I don't know how my C++ and Visual-C++ tags have disappeared from the original question. I have added them back up. – George Robinson Mar 03 '18 at 19:36
  • Oh! I hope Pablo reads your comment before he removes them again. – George Robinson Mar 03 '18 at 19:49
  • 1
    @KarolaN yes, I've read the comment now and I must confess that I'm guilty of not reading the whole question before removing the tags because at first glance I didn't see anything related to C++ or MSVC. To my defence in 99.9% of the cases, people just tag everything with `c` and `c++` even though the question has nothing to do with C or C++. So I apologize because in this case I was wrong and it's fair to call me out on this. And no, I'm not going to remove them again, because now I know and understand that your question is indeed related to those tags. – Pablo Mar 04 '18 at 00:30
4

There is fundamentally no way to express what the author wants to express. The first version of the code is rightly optimized down to nothing by some compilers, because the underlying object unsigned char x[4] is not volatile; accessing it through pointer-to-volatile does not magically make it volatile.

The second version of the code is a hack that happens to achieve what the author wants, but at significant additional cost, and in the real world might be counter-productive. If (in a non-toy, fleshed-out example) the array x had only been used such that the compiler was able to keep it entirely in registers, the hacks in memset_s_volatile_pnt would force it to be spilled out to actual memory on the stack, only then to be clobbered, and memset_s_volatile_pnt would not be able to do anything to get rid of the copies in the original registers. A cheaper way to achieve the same thing would be just calling normal memset on x, then passing x to an external function whose definition the compiler can't see (to be safest, an external function in a different shared library).

Safe memory clearing is not expressible in C; it needs compiler/language-level extensions. The best way to do it in C+POSIX is just to do all the handling of sensitive data in a separate process whose lifetime is limited to the duration the sensitive data is needed and rely on memory protection boundaries to ensure it never leaks anywhere else.

If you just want to get rid of the warning, though, the solution is easy. Simply change:

volatile unsigned char * volatile p = (volatile unsigned char * volatile)v;

to:

volatile unsigned char * volatile p = (volatile unsigned char *)v;
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • "accessing it through pointer-to-volatile does not magically make it volatile." Hmmm, instead if we had `union { unsigned char nv[4]; unsigned char v[4]; } u;`, can code access `u.v[0]` without treating it as `volatile`? I would think in this case, it must. – chux - Reinstate Monica Mar 03 '18 at 18:45
  • @chux: There is no `volatile` in the code in your question. Did you mean `union { unsigned char nv[4]; volatile unsigned char v[4]; } u;`? – Eric Postpischil Mar 03 '18 at 19:28
  • @EricPostpischil Yes: `union { unsigned char nv[4]; volatile unsigned char v[4]; } u;` – chux - Reinstate Monica Mar 03 '18 at 20:36