When is memset to 0 nonportable?

Question

In C, {0} is the universal zero initializer equivalent to C++'s {} (the latter being invalid in C). It is necessary to use whenever you want a zero-initialized object of a complete but conceptually-opaque or implementation-defined type. The classic example in the C standard library is mbstate_t:
mbstate_t state = { 0 }; /* correctly zero-initialized */
versus the common but nonportable:
mbstate_t state;
memset(&state, 0, sizeof state);

It strikes me as odd that the latter version could be unportable (even for implementation-defined types, the compiler has to know the size). What is the issue here and when is a memset(x, 0, sizeof x) unportable?

Steve Summit · Accepted Answer · 2022-05-10T14:08:21.550

33

memset(p, 0, n) sets to all-bits-0.
An initializer of { 0 } sets to the value 0.
On just about any machine you've ever heard of, the two concepts are equivalent.

However, there have been machines where the floating-point value 0.0 was not represented by a bit pattern of all-bits-0. And there have been machines where a null pointer was not represented by a bit pattern of all-bits-0, either. On those machines, an initializer of { 0 } would always get you the zero initialization you wanted, while memset might not.

See also question 7.31 and question 5.17 in the C FAQ list.

Postscript: One other difference, as pointed out by @ryker: memset will set any "holes" in a padded structure to 0, while setting that structure to { 0 } might not.

edited May 10 '22 at 14:08

answered Nov 30 '21 at 13:25

Steve Summit

45,437
7
70
103

I was about to write the same. I was working on the soviet computer where float numbers with all words set to zero was a trap representation :) – 0___________ Nov 30 '21 at 13:30
13

It used to puzzle me that all-bits-0 could ever not be the floating-point value 0.0. But once you learn how floating-point values work inside, typically using an offset representation for the signed exponent, it's kind of surprising that all-bits-0 *does* represent 0.0, since an exponent value of 0 is typically represented as the biased form `0x80` or `0x400`. (But of course in IEEE-754 0.0 is *not* stored with an actual exponent value of 0, so it works out.) – Steve Summit Nov 30 '21 at 13:36
Old computers were not using modern datatypes :). On that one, all zero was an indication of the memory subsystem failure. Any zero read from memory was rasing hardware exception. – 0___________ Nov 30 '21 at 14:32
2

_Classic_ may not be apropos, but because of the variability of how `mbstate_t` can be defined, it is a _good_ example. – ryyker Nov 30 '21 at 17:49
Could you name these machines, please? – Thomas Weller Nov 30 '21 at 22:07
2

@ThomasWeller For nonzero pointer examples, see [question 5.17](http://c-faq.com/null/machexamp.html) in the [C FAQ list](http://c-faq.com/). I don't have any names to hand of machines that had nonzero representations of floating-point 0.0, but I have it on good authority they existed. – Steve Summit Nov 30 '21 at 22:09
The standard 32-bit and 64-bit floating point formats are laid out so that the represented number is a monotonic function of the bit representation interpreted as an integer (aside from the sign bit). From that perspective it isn't surprising that 0.0 is represented by 000...000. In formats like x87 80-bit floating point where the leading digit of the mantissa is stored explicitly, you just need the mantissa to be zero and the exponent to be anything other than the special infinity/NaN value, so again it's not surprising that 000...000 represents 0.0. – benrg Dec 01 '21 at 00:55
1

@benrg Good point about monotonicity. Yet another way of looking at it is that 0.0 sits very nicely among the subnormals, where it surely belongs, because it obviously doesn't have an implicit "1" bit, either. – Steve Summit Dec 01 '21 at 01:11

dbush · Answer 2 · 2021-11-30T14:55:28.837

The reason for this has to do with how types are represented.

Section 6.7.9p10 of the C standard describes how fields are initialized as follows:

If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static or thread storage duration is not initialized

explicitly, then:

if it has pointer type, it is initialized to a null pointer;

if it has arithmetic type, it is initialized to (positive or unsigned) zero;

if it is an aggregate, every member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

if it is a union, the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits

And p21 also states:

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration

The difference between this and setting all bytes to zero is that some of the above values may not necessarily be represented by all bits zero.

For example, there are some architectures where the address 0 is a valid address. This means that a null pointer is not represented as all bits zero. (Note: (void *)0 is specified as a null pointer constant by the standard, however the implementation will treat this as whatever the representation of a null pointer is)

The standard also doesn't mandate a particular representation for floating point types. While the most common representation, IEEE754, does use all bits 0 to represent the value +0, this is not necessarily true for other representations.

"that a NULL pointer " better as "that a _null pointer_" (lower case). The _null pointer constant_ `NULL` carries it set of difficulties. `NULL` may be `int` 0, yet a _null pointer_ may differ from all zero bits. — chux - Reinstate Monica, Nov 30 '21 at 14:11

ryyker · Answer 3 · 2021-11-30T14:18:03.820

Noting a difference in behavior between the two methods...

In ...= {0}; if padding bytes exist, they will not be cleared.
But a call to memset() will clear padding.

From here

"Possible implementation of mbstate_t is a struct type holding an array representing the incomplete multibyte character, an integer counter indicating the number of bytes in the array that have been processed, and a representation of the current shift state."

In the case mbstate_t is implemented as a struct it is notable that {0} will not set padding bytes that may exist to zero, making the following assumption debatable:

mbstate_t state = { 0 }; /* correctly zero-initialized */

memset() however does include padding bytes.

memset(state , 0, sizeof state);//all bytes in memory region of test will be cleared

When is memset to 0 nonportable?

3 Answers3