3

All the examples I've seen of reading a double of known endianness from a buffer to the platform endianness involve detecting the current platform's endianess and performing byte-swapping when necessary.

On the other hand, I've seen another way of doing the same thing except for integers that uses bit shifting (one such example).

This got me thinking that it might be possible to use a union and the bitshift technique to read doubles (and floats) from buffers, and a quick test implementation seemed to work (at least with clang on x86_64):

#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>

double read_double(char * buffer, bool le) {
    union {
        double d;
        uint64_t i;
    } data;
    data.i = 0;

    int off = le ? 0 : 7;
    int add = le ? 1 : -1;
    for (int i = 0; i < 8; i++) {
        data.i |= ((uint64_t)(buffer[off] & 0xFF) << (i * 8));
        off += add;
    }
    return data.d;
}

int main() {
    char buffer_le[] = {0x6E, 0x86, 0x1B, 0xF0, 0xF9, 0x21, 0x09, 0x40};
    printf("%f\n", read_double(buffer_le, true)); // 3.141590

    char buffer_be[] = {0x40, 0x09, 0x21, 0xF9, 0xF0, 0x1B, 0x86, 0x6E};
    printf("%f\n", read_double(buffer_be, false)); // 3.141590

    return 0;
}

My question though is, is this a safe way to do this? Or is there undefined behavior involved here? Or if both this and the byte-swap method involve undefined behavior, is one safer than the other?

Alexander O'Mara
  • 58,688
  • 18
  • 163
  • 171
  • Unfortunately there seem to be no portable or even somewhat standard-complying way in C to serialize and de-serialize doubles. Even if you "swap" the bytes as `char` array, it would mean that you can't reinterpret it as `double` anymore without violation of the strict-aliasing rule. In your example the violation is even more visible, as you reinterpret integer type as `double`. So yes, technically it is UB. – Eugene Sh. Oct 03 '18 at 16:26
  • @EugeneSh. Are you saying that the byte-swap technique is also undefined behavior? – Alexander O'Mara Oct 03 '18 at 16:30
  • Yes. According to the strict-aliasing rule an object of a specific type can be only accessed using pointer of this type. It is OK to access it through `char*`, but if modified through this array, it can't be safely reinterpreted back to the original type. – Eugene Sh. Oct 03 '18 at 16:33
  • 1
    @EugeneSh. [memcpy is the way to avoid strict aliasing](https://stackoverflow.com/a/51228315/1708801) whether you have a valid double is a different story but I don't think the standard could guarantee that. – Shafik Yaghmour Oct 03 '18 at 16:33
  • @ShafikYaghmour `memcpy` is a way to hide the violation, but not (some of) the rationales behind the rule. It won't avoid the possible trap representation and alignment issues. – Eugene Sh. Oct 03 '18 at 16:35
  • @EugeneSh. there are standard ways of dealing [with alignment](https://en.cppreference.com/w/c/language/_Alignas), valid representation can not be but the standard can never help you there. – Shafik Yaghmour Oct 03 '18 at 16:38
  • @ShafikYaghmour Indeed, a careful programming can reduce the UB to implementation defined behavior. But not farther than that.. – Eugene Sh. Oct 03 '18 at 16:40
  • @EugeneSh.: There is no code in the question that modifies an object by writing bytes (character types) to it (except for objects that are character types). The code in the question reinterprets bytes via a union, which is permitted by the C standard. – Eric Postpischil Oct 03 '18 at 16:56
  • @EricPostpischil Reinterpreting an object modified as an object of different type is not permitted, no matter what mechanism is used to modify it. The same as the aliasing is allowed with `char*` but modification and later reinterpretation is not. – Eugene Sh. Oct 03 '18 at 16:59
  • 1
    @EugeneSh.: Reinterpreting via a union is permitted by C 2018 6.5.2.3 3, and footnote 99 explicitly confirms that is the intended meaning of the text. – Eric Postpischil Oct 03 '18 at 17:00
  • @EricPostpischil I afraid I don't have an access to C 2018 standard, and the numbers seem different from C11 I am looking at.. – Eugene Sh. Oct 03 '18 at 17:03
  • 1
    @EugeneSh.: The clause and paragraph are the same in C 2018,C 2011, and C 1999. The note is 99 in C 2018, 95 in C 2011. C 1999 does not have the note. – Eric Postpischil Oct 03 '18 at 17:05
  • 1
    @EugeneSh.: Modifying the bytes of an object through a character type does not violate the aliasing rules. Per C 2018 6.5 7, “An object shall have its stored value accessed only by… [various types], or a character type.” Per 3.1 1, “access” means “to read or modify the value of an object”. – Eric Postpischil Oct 03 '18 at 17:10
  • @EricPostpischil I was about to cite this paragraph and ask how well does it work with the aforementioned one (about unions). This discussion is arising quite frequently here on SO and every time there seem to be different conclusions from it... – Eugene Sh. Oct 03 '18 at 17:14
  • 1
    @EugeneSh.: When one reads a member of a union through the `structure.member` expression, one is reading the object that is that member through its own type, which is compatible with itself, of course. So it does not violate the aliasing rule. It could violate a rule about accessing a member other than the last-written member, if there were such a rule, but there is not. – Eric Postpischil Oct 03 '18 at 17:18
  • @EricPostpischil So bottom line, if there is say, a 4-byte wide communication channel and the data is received as an array of `uint32_t` which is a member of a union with some type `T`, it is safe and standard compliant to read the data as `T` assuming that it is what was sent? – Eugene Sh. Oct 03 '18 at 17:24
  • @ShafikYaghmour: The Effective Type rules limit memcpy's type-conversion abilities to cases where the destination is of static or automatic duration. The Standard does not define a reasonable solution to go the other way. – supercat Oct 03 '18 at 18:21
  • @EricPostpischil: An access to a member is also an access to the union itself, as well as to all members thereof. If an access via lvalue that is freshly derived from an lvalue of union type is recognized as an access to the union, such an access will be entitled to access all union members. I see nothing in the Standard, however, that would compel such recognition in any case, nor that would suggest that quality implementations shouldn't make any attempt to recognize more cases than gcc and clang. – supercat Oct 03 '18 at 18:27
  • @supercat I am confused by your wording (it probably me). Do you support Erics answer or object? – Eugene Sh. Oct 03 '18 at 19:29
  • @EugeneSh.: I should perhaps have said "...is recognized as a *legitimate* access to the union". Does that clear things up? – supercat Oct 03 '18 at 19:51
  • @EugeneSh: Alternatively, if one interprets 6.5p7 as describing the types of things *that may alias* [hardly an unreasonable interpretation given the footnote], and recognizes that anything that would be allowed to alias an lvalue should also be allowed to access an lvalue which is *freshly derived* from it, that would also work. IMHO, the authors of C89 thought the latter point obvious enough that it didn't need to be mentioned. – supercat Oct 03 '18 at 20:09

2 Answers2

2

Reinterpreting Through a Union

Constructing a uint64_t value by shifting and ORing bytes is of course supported by the C standard. (There is some hazard when shifting due to the need to ensure the left operand is the correct size and type to avoid issues with overflow and shift width, but the code in the question correctly converts to uint64_t before shifting.) Then the question remaining for the code is whether reinterpreting through a union is permitted by the C standard. The answer is yes.

C 6.5.2.3 3 says:

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,99)

and note 99 says:

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning")…

Such reinterpretation of course relies on the object representations used in the C implementation. Notably the double must use the expected format, matching the bytes read from the input stream.

Modifying the Bytes of an Object

Modifying an object by modifying its bytes (as by using a pointer to unsigned char) is permitted by C. C 2018 6.5 7 says:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [list of various types], or a character type.

Although one of the comments states that you may “access” but not “modify” the bytes of an object this way (apparently interpreting “access” to mean only reading, not writing), C 2018 3.1 defines “access” as:

to read or modify the value of an object.

Thus, one is permitted to read or write the bytes of an object through character types.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
0

Reading double to platform endianness with union and bit shift, is it safe?

This kind of thing only makes sense when dealing with data from outside the program (e.g. data from a file or network); where you have a strict format for the data (defined in the file format's specification or the network protocol's specification) that may have nothing to do with the format C uses, may have nothing to do with the CPU uses and may not be IEEE 754 format either.

On the other side C doesn't provide any guarantees at all. For a simple example, it's perfectly legal for the compiler to use a BCD format for float where 0x12345e78 = 1.2345 * 10**78, even if the CPU itself happens to support "IEEE 754".

The result is you have "whatever the spec says format" from outside the program and you're converting that into a different "whatever the compiler felt like format" for use inside the program; and every single assumption you've made (including sizeof(double)) is potentially false.

Brendan
  • 35,656
  • 2
  • 39
  • 66