9

Is this code well-defined behavior, in terms of strict aliasing?

_Bool* array = malloc(n);
memset(array, 0xFF, n);
_Bool x = array[0];

The rule of effective type has special cases for memcpy and memmove (C17 6.5 §6) but not for memset.

My take is that the effective type becomes unsigned char. Because the second parameter of memset is required to be converted to unsigned char (C17 7.24.6.1) and because of the rule of effective type, (C17 6.5 §6):

...or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one.

  • Question 1: What is the effective type of the data stored in array after the memset call?
  • Question 2: Does the array[0] access therefore violate strict aliasing? Since _Bool is not a type excluded from the strict aliasing rule (unlike character types).
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 1
    Even if a comparison like `array[0] == true` is not often used and considered to verbose, I think it should still be valid. And with your `memset` call that comparison will actually be *false*. – Some programmer dude Nov 06 '18 at 12:28
  • 1
    @Someprogrammerdude This goes deeper than just the value comparison. If there is UB because of strict aliasing violation, the compiler is free to give `x` any value, or crash and burn for that matter. – Lundin Nov 06 '18 at 12:31
  • I would think the effective type is still `_Bool`, however, the representation in the memory may not be valid. Possibly `memset(array,(unsigned char)false, n);` would make it valid. – Paul Ogilvie Nov 06 '18 at 12:32
  • That's true, so maybe I should have waited for an answer to that before bothering about petty details like that. :) – Some programmer dude Nov 06 '18 at 12:32
  • @PaulOgilvie What do you mean "still _Bool"? How did it end up _Bool? The pointer returned by malloc has no effective type yet. And the cast in your memset example is superfluous, check 7.24.6.1. – Lundin Nov 06 '18 at 12:33
  • Oh, I see. I was thinking the type is `_Bool *` so the memory will be interpreted as `_Bool`. – Paul Ogilvie Nov 06 '18 at 12:34
  • @PaulOgilvie No, unfortunately C doesn't work like that. Dynamically allocated memory does not get a type before a value is stored inside it. – Lundin Nov 06 '18 at 12:38
  • 1
    While I see you refer C17, is it really a different question from https://stackoverflow.com/questions/30970251/what-is-the-effective-type-of-an-object-written-by-memset ? – tevemadar Nov 06 '18 at 12:48
  • 3
    According to this logic, memsetting an allocated block to `0` and reading `int`s out of it would also be UB. IMO that would not have been anyone's intent, and this is yet another case of the strict aliasing rule being woefully underspecified. – M.M Nov 06 '18 at 12:53
  • 1
    @AnttiHaapala `(unsigned char)false` has value `0` and not `1`. And effectively the cast serves nothing since when passing the argument, it is converted to `int`, anyhow. – Jens Gustedt Nov 06 '18 at 13:08
  • @tevemadar It seems very similar, but the answer there isn't too convincing. – Lundin Nov 06 '18 at 13:09
  • @JensGustedt ops :D indeed. I somehow supposed that setting to 0xFF was supposed to set it to true, hence I read `(unsigned char)true`. `errno == ENOCOFEE` – Antti Haapala -- Слава Україні Nov 06 '18 at 13:28
  • Related [Is it safe to memset bool to 0?](https://stackoverflow.com/q/33380742/1708801) – Shafik Yaghmour Nov 06 '18 at 13:51

1 Answers1

7
  1. memset does not change the effective type. C11 (C17) 6.5p6:

    1. The effective type of an object for an access to its stored value is the declared type of the object, if any. [ This clearly is not the case. An allocated object has no declared type. ]

      If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. [ this is not the case as an lvalue of character type is used by memset! ]

      If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. [ this too is not the case here - it is not copied with memcpy, memmove or an array of characters ]

      For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access. [ therefore, this has to apply in our case. Notice that this applies to accessing it as characters inside memset as well as dereferencing array. ]

    Since the values are stored with an lvalue that has character type inside memset, and not have the bytes copied from another object with lvalues of character type (the clause exists to equate memcpy and memmove with doing the same with an explicit for loop!), it does not get an effective type, and the effective type of elements is _Bool for those accessed through array.

    There might be parts in the C17 standard that are underspecified, but this certainly is not one of those cases.

  2. array[0] would not violate the effective type rule.

    That does not make using the value of array[0] any more legal. It can (and will most probably) be a trap value!

    I tried the following functions

    #include <stdio.h>
    #include <stdbool.h>        
    
    void f1(bool x, bool y) {
        if (!x && !y) {
            puts("both false");
        }
    }
    
    
    void f2(bool x, bool y) {
        if (x && y) {
            puts("both true");
        }
    }
    
    void f3(bool x) {
        if (x) {
            puts("true");
        }
    }
    
    void f4(bool x) {
        if (!x) {
            puts("false");
        }
    }
    

    with array[0] as any of the arguments - for the sake of avoiding compile-time optimizations this was compiled separately. When compiled with -O3 the following messages were printed:

    both true
    true
    

    And when without any optimization

    both false
    both true
    true
    false