Dealing with data serialization without violating the strict aliasing rule

Question

Often in embedded programming (but not limited to) there is a need to serialize some arbitrary struct in order to send it over some communication channel or write to some memory.

Example

Let's consider a structure composed of different data types in a N-aligned memory region:

struct
{
    float a;
    uint8_t b;
    uint32_t c;
} s;

Now let's assume we have a library function

void write_to_eeprom(uint32_t *data, uint32_t len);

which is taking the pointer to data to be written as a uint32_t*. Now we would like to write s to the eeprom using this function. A naive approach would be to do something like

write_to_eeprom((uint32_t*)&s, sizeof(s)/4);

But it is a clear violation of the strict aliasing rule.

Second example

struct
{
    uint32_t a;
    uint8_t b;
    uint32_t c;
} s;

In this case the aliasing (uint32_t*)&s is not violating the rule, as the pointer is compatible with the pointer to the first field type, which is legal. But! The library function can be implemented such that it is doing some pointer arithmetic to iterate the input data, while this arithmetic resulting pointers are incompatible with the data they are pointing to (for example data+1 is the pointer of type uint32_t*, but it might point to the uint8_t field). Which again a violation of the rule, as I understand it.

Possible solution?

Wrap the problematic structure in a union with array of the desired type:

union 
{
    struct_type s;
    uint32_t array[sizeof(struct_type) / 4];
} u;

And pass the u.array to the library function.

Is this the right way to do this? Is this the only right way to do this? What could be some other approaches?

Very simple solution: `write_to_eeprom(char const* data, size_t len_times_four)`. — Kerrek SB, Jun 19 '15 at 18:42
@KerrekSB As I said, `write_to_eeprom` is a library function, which is out of our control. And still, casting a `float*` to `char*` is a violation, isn't it?. — Eugene Sh., Jun 19 '15 at 18:43
@KerrekSB Actually no, it isn't... Anything can be aliased to `char*`. — Eugene Sh., Jun 19 '15 at 18:48
No, you can interpret any object as a sequence of characters. That's expressly *not* an aliasing violation. — Kerrek SB, Jun 19 '15 at 18:48
So yeah, you can copy the bytes from your object to an array of `uint32_`, and then copy the ints. (Or you could just not care and just limit your code to your one platform.) — Kerrek SB, Jun 19 '15 at 18:49
With gcc, you could `-fno-strict-aliasing`. (better wrap the code in a pragma). However, expecially when storing to an EEPROM, etc., just dumping the struct might become problematic after a firmware update. And fine-grain versioning tends to become inconsistent after the second update (personal observation). — too honest for this site, Jun 19 '15 at 19:55
@Olaf I am aware of this switch, and I was using it up until now, when decided to clean up the code of stuff I've learned reading SO :) Is this switch just disabling the rule checking, or it is actually making the violating code safe? — Eugene Sh., Jun 19 '15 at 20:07
@EugeneSh.: Well, I never used it myself (where possible, I prefer expicit serialization of fields by type). But AFIK gcc will asume aliasing happens, so it should be safe - disclaimer: no warranty, no free beer! — too honest for this site, Jun 19 '15 at 20:26
@Olaf From some other SO discussions I am considering you as the main advocate of a proper data marshalling. Have you worked with Google's protobuf and it's derivatives (specifically [nanopb](http://koti.kapsi.fi/~jpa/nanopb/))? Looking into it for somewhat standardized way for serializing data. — Eugene Sh., Aug 04 '15 at 21:42
@EugeneSh.: I have not. Regarding advocacy: it depends. I have no problem to use binary for internal data, e.g. storing in a local EEPROM, where the ABI is well-defined and if compact code is vital. Otherwise my experience is that the clean way is often not much more complicated. However, thanks for the link, I'll have a look. Until now, I used Python to create data structures to serialize along with the required meta-data. Oh, just read: they seem to use a very similar approach, including my favorite language:-) — too honest for this site, Aug 04 '15 at 21:47
Heck, I will **definitively** evaluate that - many thanks! - They even use my favorite build-system. I thought to extend my own tool to a proper syntax, (I use Python dicts/containers/etc. right now). — too honest for this site, Aug 04 '15 at 21:59

score 3 · Answer 1 · edited May 23 '17 at 12:31

3

Just a note I am not entirely sure but it can be that it is not always safe to cast uint8_t* to char*(here).

Regardless, what does the last parameter of your write function want, number of bytes to write - or number of uint32_t elements? Let's assume later, and also assume you want to write each member of the struct to separate integer. You can do this:

uint32_t dest[4] = {0};
memcpy(buffer, &s.a, sizeof(float));
memcpy(buffer+1, &s.b, sizeof(uint8_t));
memcpy(buffer+2, &s.c, sizeof(uint32_t));

write_to_eeprom(buffer, 3 /* Nr of elements */);

If you want to copy the structure elements to the integer array consecutively - you can first copy the structure member to a byte array consecutively - and then copy the byte array to the uint32_t array. And also pass number of bytes as last parameter which would be - sizeof(float)+sizeof(uint8_t)+sizeof(uint32_t)

edited May 23 '17 at 12:31

Community

1
1

answered Jun 19 '15 at 18:45

@Eugene Sh.: What is your opinion about this solution? – Jun 19 '15 at 20:08
I am trying to avoid any unnecessary extra memory operations, so basically looking for the type-system only solution. Thank you. – Eugene Sh. Jun 19 '15 at 20:10
@EugeneSh.: I think this one was actually safe. I mentioned also other way it is not clear you want to copy to separate array elements or align all the values (second option in my answer) – Jun 19 '15 at 20:11
@EugeneSh. A good compiler should avoid "unnecessary extra memory operations" when presented with code like this. The escape hatch you're hoping to find in the type system simply does not exist in standard C - your options are `memcpy`, reading the representation via an `unsigned char *`, or compiler extensions. And you probably will wind up writing code like this to deal with padding and endianness, anyway. – zwol Jun 19 '15 at 20:14
@zwol Why wouldn't you consider the `union` solution? – Eugene Sh. Jun 19 '15 at 20:16
@EugeneSh. Some compilers do not implement the C99 corrigiendum that made the `union` approach valid (unspecified rather than undefined behavior). C++ never picked that corrigiendum up, either. And it does not deal with padding or endianness. – zwol Jun 19 '15 at 20:25

score 0 · Answer 2 · answered Jun 28 '17 at 21:08

Consider that writing to eeprom is often slower, sometimes a lot slower, than writing to normal memory, that using an intervening buffer is rarely a performance drag. I realize this goes against this comment, yet I feel it deserves consideration as it handles all other C concerns

Write a helper function that has no alignment, aliasing nor size issues

extern void write_to_eeprom(/* I'd expect const */ uint32_t *data, uint32_t len);

// Adjust N per system needs
#define BYTES_TO_EEPROM_N 16

void write_bytes_to_eeprom(const void *ptr, size_t size) {
  const unsigned char *byte_ptr = ptr;
  union {
    uint32_t data32[BYTES_TO_EEPROM_N / sizeof (uint32_t)];
    unsigned char data8[BYTES_TO_EEPROM_N];
  } u;

  while (size >= BYTES_TO_EEPROM_N) {
    memcpy(u.data8, byte_ptr, BYTES_TO_EEPROM_N);  // **
    byte_ptr += BYTES_TO_EEPROM_N;
    write_to_eeprom(u.data32, BYTES_TO_EEPROM_N / sizeof (uint32_t));
    size -= BYTES_TO_EEPROM_N;
  }

  if (size > 0) {
    memcpy(u.data8, byte_ptr, size);
    while (size % sizeof (uint32_t)) {
      u.data8[size++] = 0;  // zero fill
    }
    write_to_eeprom(u.data32, (uint32_t) size);
  }
}

// usage - very simple
write_bytes_to_eeprom(&s, sizeof s);

** Could use memcpy(u.data32, byte_ptr, BYTES_TO_EEPROM_N); to handle @zwol issue.

Dealing with data serialization without violating the strict aliasing rule

2 Answers2

Linked