C deliberately allows accessing the bytes of an object and supports communicating objects by transmitting the bytes that represent them and reconstructing them from the transmitted bytes. However, it should be done correctly, and there are some issues to deal with.
A character type should be used.
The preferred type to work with is unsigned char
. This is preferred for two reasons:
- The C standard defines the behavior of using character types to access the representations of objects. The character types are
char
, signed char
, and unsigned char
. The standard does not require that uint8_t
be a character type. Although it may have the same size and general properties of unsigned char
, it may be an extended integer type rather than an alias of unsigned char
(or of char
). In this case, the C standard does not define the behavior of accessing the bytes of an object with uint8_t
.
unsigned char
is preferred over char
or signed char
to avoid problems with signed integers in various C operations.
The sender and the receiver must agree on the representations of the objects or the protocol used for sending them.
If the sender and the receiver are compiled with the same C implementation using the same definitions for the objects being transmitted (such as the same structure definitions), they will agree on the representations. Between diverse C implementations, though, it is necessary to ensure there is clear agreement on how the transmitted bytes represent objects. As shown in your code, the structure is packed, which should take care of the problem that there may be padding inside structures. Other considerations include:
- The order of bytes within integers. Little-endian-first (bytes in order from least significant to most) and big-endian-first (the reverse) are common, although others are possible. Big endian is most common in network protocols.
- Representations of non-integers, such as floating-point formats. The IEEE-754 floating-point standard specifies some interchange formats, which are very widely used.
- Structures are layed out identically, including the types of members.
- Theoretically, the order of bits within bytes must be agreed, but this is not an issue if the network service is operating at the byte level.
Note that, of course, some objects are inherently impossible to send via bytes representations due to needing context in the running program, such as pointers and file handles.
Additional note
Another hazard to guard against is interpreting a byte buffer as another object. The C standard defines the behavior of accessing the bytes of an object (for example, something defined as a structure) using a character type, but it does not define the reverse. Sometimes naïve programmers will create an array of character type, read a network message into it, and then convert a pointer into the array to a pointer to a structure type. This runs afoul of two issues:
- The conversion is not defined if the alignment is not correct. (This should not be a problem with a packed array, which we would expect to have an alignment requirement of one byte.)
- Accessing an array of characters as a different, incompatible type is not defined by the C standard.
The proper way to reassemble received bytes into an object is to copy them either into memory declared as the desired type or memory allocated (as with malloc
) for the purpose of interpreting it as the intended object. This can be done by copying bytes from a buffer into the target memory or by directly passing the target memory to the network read routine, for it to fill in the bytes directly.