3

This has been bugging me for a very long time: how to do pointer conversion from anything to char * to dump binary to disk.

In C, you don't even think about it.

double d = 3.14;
char *cp = (char *)&d;

// do what u would do to dump to disk

However, in C++, where everyone is saying C-cast is frowned upon, I've been doing this:

double d = 3.14;
auto cp = reinterpret_cast<char *>(&d);

Now this is copied from cppreference, so I assume this is the proper way.

However, I've read from multiple sources saying this is UB. (e.g. this one) So I can't help wonder if there is any "DB" way at all (According to that post, there's none).

Another scenario I often encounter is to implement an API like this:

void serialize(void *buffer);

where you would dump a lot of things to this buffer. Now, I've been doing this:

void serialize(void *buffer) {
    int intToDump;
    float floatToDump;

    int *ip = reinterpret_cast<int *>(buffer);
    ip[0] = intToDump;

    float *fp = reinterpret_cast<float *>(&ip[1]);
    fp[0] = floatToDump;
}

Well, I guess this is UB as well.

Now, is there truly no "DB" way to accomplish either of these tasks? I've seen someone using uintptr_t to accomplish sth similar to serialize task with pointer as integer math along with sizeof, but I'm guessing here that it's UB as well.

Even though they are UB, compiler writers usually do the rational things to make sure everything is okay. And I'm okay with that: it's not an unreasonable thing to ask for.

So my questions really are, for the two common tasks mentioned above:

  1. Is there truly no "DB" way to accomplish them that will satisfy the ultimate C++ freaks?
  2. Any better way to accomplish them other than what I've been doing?

Thanks!

TerryTsao
  • 565
  • 7
  • 16

1 Answers1

6

Your serialize implementation's behavior is undefined because you violate the strict aliasing rules. The strict aliasing rules say, in short, that you cannot reference any object via a pointer or reference to a different type. There is one major exception to that rule though: any object may be referenced via a pointer to char, unsigned char, or (since C++17) std::byte. Note that this exception does not apply the other way around; a char array may not be accessed via a pointer to a type other than char.

That means that you can make your serialize function well-defined by changing it as so:

void serialize(char* buffer) {
    int intToDump = 42;
    float floatToDump = 3.14;

    std::memcpy(buffer, &intToDump, sizeof(intToDump));
    std::memcpy(buffer + sizeof(intToDump), &floatToDump, sizeof(floatToDump));

    // Or you could do byte-by-byte manual copy loops
    // i.e.
    //for (std::size_t i = 0; i < sizeof(intToDump); ++i, ++buffer) {
    //    *buffer = reinterpret_cast<char*>(&intToDump)[i];
    //}
    //for (std::size_t i = 0; i < sizeof(floatToDump); ++i, ++buffer) {
    //    *buffer = reinterpret_cast<char*>(&floatToDump)[i];
    //}
}

Here, rather than casting buffer to a pointer to an incompatible type, std::memcpy casts a pointer to the object to serialize to a pointer to unsigned char. In doing so, the strict aliasing rules are not violated, and the program's behavior remains well-defined. Note that the exact representation is still unspecified; as it will depend on your CPU's endianess.

Miles Budnek
  • 28,216
  • 2
  • 35
  • 52
  • With the ~3% slower `std::copy_n`, you can 'chain' the resulting destination pointer: `dest = std::copy_n(&i, sizeof(i), dest);` – xtofl Apr 25 '19 at 07:25
  • 2
    @xtofl With `std::copy` you would need an explicit cast from `int*` to `char*` (which `std::memcpy` hides away), but yes, I suspect the two would compile to the [exact same assembly](https://godbolt.org/z/sMDxAe) – Miles Budnek Apr 25 '19 at 07:30
  • @MilesBudnek Thanks for you answer. I need time to digest, though. I do have some follow ups: 1. from the SO link in my OP, it mentions casting from `char *` to `unsigned char *` might crash the program. It seems contradictory to the exception you mentioned. Could u please explain more on that?; 2. I cannot change the `serialize` prototype to `char *`. Does that mean I should `static_cast` the `void *` buffer to `char *` first thing in the implementation? Thx! – TerryTsao Apr 25 '19 at 07:49
  • @MilesBudnek Eh, another follow up Q: the `uintptr_t` way, like I said, casting pointer to integer and doing address math with `sizeof`, is this also UB and should be avoided? – TerryTsao Apr 25 '19 at 07:59
  • 1
    @TerryTsao 1) I believe the premise of that question is incorrect. A `char*` may alias an `unsigned char` for several reasons. 2) `std::memcpy` accepts a `void*` and casts it to `unsigned char*` internally, so you're fine there, but you can't do arithmetic on a `void*`, so you'll need to do a cast at some point if you're writing multiple objects to the buffer. 3) It depends, but it is probably UB. What really matters is if there exists an object of the appropriate type at the location pointed to by your pointer, not how you got that pointer. – Miles Budnek Apr 25 '19 at 08:08
  • @MilesBudnek Thank you. I've been bugged by having to cast to sth to do pointer arithmetic for a long time. Knowing casting to `char *` is the proper approach, now I can rest in peace. I should have read relevant parts on cppreference more carefully. – TerryTsao Apr 25 '19 at 08:37