3

Sample code:

#include <assert.h>

struct S
{
    unsigned char ch;
    int i;
};

int main()
{
    struct S s;

    memset(&s, 0, sizeof s);

    s.ch = 257; 

    assert( 0 == ((unsigned char *)&s)[1] );
}

Can the assertion fail?

The motivation for the question is whether a compiler on a little-endian system could decide to use a 4-byte store to implement s.ch = 257;. Obviously nobody would ever write code like I did in my example, but something similar might realistically occur if ch is assigned in various ways in a program which then goes on to use memcmp to check for struct equality.

For example, if the code does --s.ch instead of s.ch = 257 - can the compiler emit a word-size decrement instruction?

I don't think the discussion around DR 451 is relevant, as that only applies to uninitialized padding; however the memset initializes all the padding to zero bytes.

Community
  • 1
  • 1
M.M
  • 138,810
  • 21
  • 208
  • 365

2 Answers2

3

Yes, it can fail. The behavior is unspecified, but not undefined.

After the assignment s.ch = 257;, the values of all padding bits take unspecified values1 , which means that, if the second byte of the structure is a padding byte, it takes unspecified value and the result of the comparison to zero isn't specified. It may trigger or not.

The read value in the assert cannot be a trap representation because unsigned char doesn't have trap representations, and because the value is unspecified, not indeterminate.


1 (Quoted from: ISO/IEC 9899:201x 6.2.6.1 General 6):
When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.

2501
  • 25,460
  • 4
  • 47
  • 87
0

ISO/IEC 9899:2011 §6.2.6.1 (Representations of types) General says:

¶6 When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.51) The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.

51) Thus, for example, structure assignment need not copy any padding bits.

However, your example doesn't do a structure assignment, so maybe that doesn't apply. I believe there is no reason to think that an assignment to a simple type member of a structure would modify the data.

However, your assert code does exhibit undefined behaviour, trying to access structure padding, which is simply not allowed.

So, it is unlikely that the assertion would fire, but because your code exhibits undefined behaviour, it could happen and you'd have no recourse.

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • I overlooked "including in a member object" in your quote.. that seems to answer my question. But I don't see why accessing structure padding (which is not uninitialized) would be undefined behaviour, can you provide something to back that up? – M.M Oct 14 '16 at 05:19
  • It's "accessing padding" that is undefined behaviour, I believe; its initialization status is irrelevant because you can't access it. However, it may take me a while to find the relevant part of the standard… – Jonathan Leffler Oct 14 '16 at 05:20
  • Making notes — this may or may not be part of a final addition to my answer: §7.14.4.1 **The `memcmp` function** The memcmp function compares the first n characters of the object pointed to by s1 to the first n characters of the object pointed to by s2.310) — 310) The contents of ‘‘holes’’ used as padding for purposes of alignment within structure objects are indeterminate. Strings shorter than their allocated space and unions may also cause problems in comparison. – Jonathan Leffler Oct 14 '16 at 05:25
  • More notes — §6.7.9 **Initialization** ¶10 … If an object that has static or thread storage duration is not initialized explicitly, then: … — if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits; — if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits; — this probably is not germane. – Jonathan Leffler Oct 14 '16 at 05:28
  • More notes — §6.7.2.1 **Structure and union specifiers** ¶15 … There may be unnamed padding within a structure object, but not at its beginning. … ¶17 There may be unnamed padding at the end of a structure or union. — partly relevant, but not stating that access to padding is verboten (though there isn't a valid way to reference padding, so I still don't see how it can be defined behaviour to access it). The search term I'm using in the PDF of the standard is 'padding', in case you're wondering. I may have to use alternative terms to find the answer I'm looking for, though. – Jonathan Leffler Oct 14 '16 at 05:32
  • More notes — the Rationale has nothing relevant to say about padding. Back in the standard: J.2 Undefined behaviour lists — The value of an unnamed member of a structure or union is used (6.7.9). §6.7.9 ¶9 Except where explicitly stated otherwise, for the purposes of this subclause unnamed members of objects of structure and union type do not participate in initialization. Unnamed members of structure objects have indeterminate value even after initialization. – Jonathan Leffler Oct 14 '16 at 05:58
  • @rici: yes, but accessing `[1]` (as opposed to `[0]`) would be going outside the bounds of 'the array'; there is only one `char` in the 'array'. So, if the assertion were using 0 and not 1, I'd not be in a quandary — it would be fine. Because it is using 1, it is problematic in my view. There is tentative precedent for treating a single object as the first (only) element of an array — but it is in Annex K (for various `*scanf_s()` variants): _If the first argument points to a scalar object, it is considered to be an array of one element_. – Jonathan Leffler Oct 14 '16 at 06:06
  • Oops, didn't see the `[1]`. The perils of reading on a phone. – rici Oct 14 '16 at 06:09
  • In context, _first_ refers to the two arguments that define a string argument to the safe `scanf_s()` functions, the second being a length for the buffer. That said, I'd rather not have to use material in Annex K as justification. – Jonathan Leffler Oct 14 '16 at 06:09
  • You're allowed to read any object as a sequence of bytes (chars) at least for the purposes of copying. But I don't think the values of padding bytes are specified, even after a memcpy. I could be wrong though anf i don't have the standard handy (or Handy, for german speakers). – rici Oct 14 '16 at 06:15
  • Iirc, using an indeterminate value is unspecified, not undefined, and reading a byte cannot trap. But that would imply that the assert could fire, since the value of the comparison is an unspecified boolean. – rici Oct 14 '16 at 06:19
  • The quotes about `memcmp` (and footnote 310) and those related to §6.7.9 all seem to indicate that the value of padding bytes is indeterminate. I've not found anything that permits the (reliable) comparison of padding bytes, or access to them other than via `memmove()`, `memcpy()`, `memset()`, etc. I'm going to sleep on it; I've given lots of indications of where the answer may lie (or may not lie) — but I know I've not fully resolved it. – Jonathan Leffler Oct 14 '16 at 06:20
  • *However, your assert code does exhibit undefined behaviour, trying to access structure padding, which is simply not allowed.* This is not correct. For example memcpy copies padding, and memset sets padding bits. There is no undefined behavior, the behavior is merely unspecified as padding bits take unspecified values, after the 257 assignment. – 2501 Oct 14 '16 at 07:46
  • Your argument in the comment that reading out of bounds of the array is out-of-bounds is not valid for two reasons. The pointer to the whole struct s is used, and not the pointer of the member. Pointer to the second byte of the object s, is pointing inside the object and may be dereferenced. – 2501 Oct 14 '16 at 07:51