12

In C11, K.3.7.4.1 The memset_s function, I found this bit of rather confusing text:

Unlike memset, any call to the memset_s function shall be evaluated strictly according to the rules of the abstract machine as described in (5.1.2.3). That is, any call to the memset_s function shall assume that the memory indicated by s and n may be accessible in the future and thus must contain the values indicated by c.

This implies that memset is not (necessarily) "evaluated strictly according to the rules of the abstract machine". (The chapter referenced is 5.1.2.3 Program execution.)

I fail to understand the leeway the standard gives to memset that is explicitly ruled out here for memset_s, and what that would mean for an implementor of either function.

DevSolar
  • 67,862
  • 21
  • 134
  • 209

1 Answers1

16

Imagine you have read a password:

{
    char password[128];

    if (fgets(password, sizeof(password), stdin) != 0)
    {
        password[strcspn(password), "\n\r"]) = '\0';
        validate_password(password);
        memset(password, '\0', sizeof(password));
    }
}

You've carefully zapped the password so it can't be found accidentally.

Unfortunately, the compiler is allowed to omit that memset() call because password is not used again. The rule for memset_s() means that the call cannot be omitted; the password variable must be zeroed, regardless of optimization.

memset_s(password, sizeof(password), '\0', sizeof(password));

This is one of the few really useful features in Annex K. (We can debate the merits of having to repeat the size. However, in a more general case, the second size can be a variable, not a constant, and then the first size becomes a runtime protection against the variable being out of control.)

Note that this requirement is placed on the compiler rather than the library. The memset_s() function will behave correctly if it is called, just as memset() will behave correctly if it is called. The rule under discussion says that the compiler must call memset_s(), even though it may be able omit the call to memset() because the variable is never used again.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 1
    Wouldn't `memset((volatile void*)password, '\0', sizeof(password));` work around it? – Eugene Sh. Jun 12 '19 at 15:25
  • 2
    Not necessarily, no — not least because it isn't clear you can safely pass a volatile pointer to a function that doesn't know it is going to get one. – Jonathan Leffler Jun 12 '19 at 15:26
  • Ah right... The function code is already there, so there is no way it can accommodate it.. – Eugene Sh. Jun 12 '19 at 15:28
  • 1
    @DevSolar: Yes, the requirement is on the compiler to actually generate that call to `memset_s()` even though if the function was `memset()`, it could be omitted. The library writer merely has to ensure that `memset_s()` does its job. – Jonathan Leffler Jun 12 '19 at 15:28
  • 1
    @EugeneSh.: IMHO a simple loop `for (size_t i = 0; i < sizeof password; i++) { *(volatile char *)(password + i) = 0; }` should do the trick, making `memset_s` a rather useless addition to the C standard. – chqrlie Jun 12 '19 at 15:28
  • 2
    @chqrlie I wouldn't call it useless.. You can replace the `memset` with loop too – Eugene Sh. Jun 12 '19 at 15:29
  • @JonathanLeffler: Oh... and there I was already hammering those `volatile`s into my `memset_s`... that must be a pain for optimizer maintainers! – DevSolar Jun 12 '19 at 15:30
  • @JonathanLeffler AFAIK `memset_s` can just be a forward to `memset` the only difference is that the compiler is not allowed to optimize it out. – Mgetz Jun 12 '19 at 15:40
  • 1
    @chqrlie actually no because of 'AS-IF' if the compiler can detect the buffer isn't read after the writes it's free to remove code that writes up to the last read. – Mgetz Jun 12 '19 at 15:41
  • @Mgetz: more or less. If you look at the prototype `errno_t memset_s(void *s, rsize_t smax, int c, rsize_t n);`, there are a number of differences from `void *memset(void *s, int c, size_t n);`. And officially, `memset_s()` has to check that `s` is not a null pointer, and so on. But the grunt work — actually setting the data to the specified byte value — could be handled by `memset_s()` calling `memset()`. – Jonathan Leffler Jun 12 '19 at 15:44
  • @JonathanLeffler yeah I realized that after I commented, I do wish glibc etc would implement just that method so people don't have to resort to `volatile` hacks. I could care less about the rest of annex k. – Mgetz Jun 12 '19 at 15:45
  • @Mgetz: Well, the reason I am *working* on Annex K at all is `strtok_s()`, on direct request by downstream... ;-) But yes, most of those functions are a bit clunky. (Including `strtok_s()`, which is worded in a way that *cannot* be fulfilled. :-D ) – DevSolar Jun 12 '19 at 15:47
  • @DevSolar AFAIK `strtok_s` and `strtok_r` were macro identical (parameter order changes I believe). – Mgetz Jun 12 '19 at 15:48
  • 1
    @DevSolar: Note that Microsoft has its own `strtok_s()` — and the Annex K version is loosely based on the MS version. The MS `strtok_s()` matches the interface to POSIX `strtok_r()` except in the spelling, but Annex K [`strtok_s()`](http://port70.net/~nsz/c/c11/n1570.html#K.3.7.3.1) is different from _both_! – Jonathan Leffler Jun 12 '19 at 15:49
  • @JonathanLeffler OOPS, yeah I got confused because I've used the MS one and macroed before... – Mgetz Jun 12 '19 at 15:50
  • @JonathanLeffler: I realized. But my lib is strictly ISO 9899 *only*, so I couldn't provide either MS `strtok_s` nor POSIX `strtok_r`... so I implemented ISO `strtok_s` by means of a reserved-namespace worker function, so downstream can either go for ISO `strtok_s` *or* wrap my worker function with a MS `strtok_s` / POSIX `strtok_r`, as per preference. ;-) -- The "impossible" part about ISO `strtok_s` is that I am supposed to bug out if the next token end cannot be found in the next `s1max` characters... but may not access the string with the delimiters if it doesn't. *That* cracked me up. :-D – DevSolar Jun 12 '19 at 15:56