How to serialize structure data in C++?

Question

I was asked in an interview to serialize data (so it could be stored in a buffer and sent over some network). This is what I came up with -

struct AMG_ANGLES {
    float yaw;
    float pitch;
    float roll;
};

char b[sizeof(struct AMG_ANGLES)];

char* encode(struct AMG_ANGLES *a)
{

    std::memcpy(b, &a, sizeof(struct AMG_ANGLES));
    return b;
}

void decode(char* data)
{
 // check endianess   
    AMG_ANGLES *tmp; //Re-make the struct
    std::memcpy(&tmp, data, sizeof(tmp));
}

Is this correct? Can anyone give alternate designs? I did not get callback so I'm just trying to learn what I could have improved.

I meant like general embedded C/C++ code. Edited the question now. — shortint, Feb 24 '20 at 05:36
@DavidC.Rankin — does `htonl()` manage conversion of `float` data? — Jonathan Leffler, Feb 24 '20 at 05:39
Given that `tmp` is a pointer, `std::memcpy(&tmp, data, sizeof(tmp));` is wrong; you need `sizeof(*tmp)`. — Jonathan Leffler, Feb 24 '20 at 05:40
@JonathanLeffler Maybe it should be sizeof(struct AMG_ANGLES)) because that's what I want in the end — shortint, Feb 24 '20 at 05:44
That would also work, but using `sizeof(*tmp)` takes the size of the object that `tmp` points at, and it points at a `struct AMG_ANGLES` — which is the same size. There are some advantages to the `sizeof(*tmp)` notation, particularly when doing dynamic memory allocation. — Jonathan Leffler, Feb 24 '20 at 05:46
@JonathanLeffler - floating point data with `htonl` may prohibit the use. I didn't check that aspect -- very good point. IEEE-754 won't like having it's exponent and mantissa scrambled. — David C. Rankin, Feb 24 '20 at 05:49
@shortint - rather than providing a `union` example (where you still have the endianness issue on how you encode the floats into the char array), take a look at the `pack754` and `upack754` examples at [Beej's Network Guide](https://beej.us/guide/bgnet/html/#structsockaddrman) (about midway down) and the `pack` and `upack` functions that incorporate encoding all types. — David C. Rankin, Feb 24 '20 at 07:13
Please don't remove the C++ language tag once there are C++ answers posted. Your edit has now made them irrelevant. I will rollback the changes. If you are only interested in C, then please ask a new question instead. — Lundin, Feb 25 '20 at 08:31
Also, serializing in an embedded context and in a PC context are wildly different things. Converting everything to strings would be unacceptable in real-time systems. — Lundin, Feb 25 '20 at 08:32

score 7 · Answer 1 · answered Feb 24 '20 at 05:40

Is this correct?

Most likely, no.

The point of serialization is to convert the data into a form that is completely platform independent - e.g. does not rely on things like endianess or if a float is an IEEE 754 or something very different. This requires:

a) strict agreement on the intended format - e.g. if it's some kind of text (XML, JSON, CSV, ...) or if it's "raw binary" with explicit definitions of the meaning of each individual byte (e.g. like maybe "byte 1 is always the lowest 8 bits of the significand").

b) correct conversion to whatever the intended format is (e.g. maybe like ensuring that byte 1 is always the lowest 8 bits of the significand regardless of any/all platform differences)

However; it is at least technically possible that the code is not supposed to be portable and the specification ("agreement on the intended format") happens to match what you ended up with for the only platform that the code is designed for; and therefore it's at least technically possible that the code is correct.

score 0 · Answer 2 · answered Feb 24 '20 at 10:50

There could be lots of improvements, but instead of telling all of them I suggest you to examine into cereal . It is widely used serialization/deserialization library, so lots of keypoints are thought.

Some of my thoughts are :

Your code depends on hardware which the program running on because of alignment and endianness. So the serialized data is not portable and compiler dependant.
char* encode(struct AMG_ANGLES *a) function returns char*, it is possibly leaked. To prevent the issue, let std::unique_ptr<T> decide its lifetime or wrap it with a class. But get rid of pointers somehow.

Templatize your serialize/deserialize operations. Otherwise, you could write same functions for other types.

template<typename T>
char* encode( T* a ) // I leave signature as is, just to demonstrate
{
     std::memcpy( b , &a , sizeof(T) );
     return b;
}

If the format is up to you, it is better to prefer human readable ones rather than binary archiving such as JSON, XML

score 0 · Answer 3 · answered Feb 25 '20 at 01:26

can someone give alternate design in C?

The "standard" way would be to use printf and scanf to create an ascii representation of the data:

#include <limits.h>
#include <math.h>
#include <stdio.h>
#include <assert.h>
#include <float.h>

struct AMG_ANGLES {
    float yaw;
    float pitch;
    float roll;
};

// declare a buffer at least this long to be sure encode works properly
#define AMG_ANGLES_BUFSIZE  ( \
    3 * ( /* 3 floats */ \
         2 + /* digit and dot */ \
         FLT_DECIMAL_DIG - 1 + /* digits after dot */ \
         4 /* the 'e±dd' part */ \
    ) \
    + 2 /* spaces */ \
    + 1 /* zero terminating character */ \
)

int encode(char *dest, size_t destsize, const struct AMG_ANGLES *a) {
    return snprintf(dest, destsize, "%.*e %.*e %.*e", 
         FLT_DECIMAL_DIG - 1, a->yaw, 
         FLT_DECIMAL_DIG - 1, a->pitch, 
         FLT_DECIMAL_DIG - 1, a->roll);
    // my pedantic self wants to add `assert(snprintf_ret < AMG_ANGLES_BUFSIZE);`
}

int decode(struct AMG_ANGLES *dest, const char *data) {
    return sscanf(data, "%e %e %e", &dest->yaw, &dest->pitch, &dest->roll) == 3 ? 0 : -1;
}

int main() {
   char buf[AMG_ANGLES_BUFSIZE];
   const struct AMG_ANGLES a = { FLT_MIN, FLT_MAX, FLT_MIN };
   encode(buf, sizeof(buf), &a);
   struct AMG_ANGLES b;
   const int decoderet = decode(&b, buf);
   assert(decoderet == 0);
   assert(b.yaw == FLT_MIN);
   assert(b.pitch == FLT_MAX);
   assert(b.roll == FLT_MIN);
}

However in bare-metal embedded I try not to use scanf - it's a big function with some dependencies. So it's better to call strtof itself, but it needs some thinking:

int decode2(struct AMG_ANGLES *dest, const char *data) {
    errno = 0;

    char *endptr = NULL;
    dest->yaw = strtof(data, &endptr);
    if (errno != 0 || endptr == data) return -1;
    if (*endptr != ' ') return -1;

    data = endptr + 1;
    dest->pitch = strtof(data, &endptr);
    if (errno != 0 || endptr == data) return -1;
    if (*endptr != ' ') return -1;

    data = endptr + 1;
    dest->roll = strtof(data, &endptr);
    if (errno != 0 || endptr == data) return -1;
    if (*endptr != '\0') return -1;

    return 0;
}

or with removed code duplication:

int decode2(struct AMG_ANGLES *dest, const char *data) {
    // array of pointers to floats to fill
    float * const dests[] = { &dest->yaw, &dest->pitch, &dest->roll };
    const size_t dests_cnt = sizeof(dests)/sizeof(*dests);
    errno = 0;
    for (int i = 0; i < dests_cnt; ++i) {
        char *endptr = NULL;
        *dests[i] = strtof(data, &endptr);
        if (errno != 0 || endptr == data) return -1;
        // space separates numbers, last number is followed by zero
        const char should_be_char = i != dests_cnt - 1 ? ' ' : '\0';
        if (*endptr != should_be_char) return -1;
        data = endptr + 1;
    }
    return 0;
}

I needed to use some google and re-read chux answers to properly recall how to use FLT_DECIMAL_DIG in printf to print floats, that's most probably because I rarely worked with floats.

score 0 · Answer 4 · answered Jul 20 '22 at 13:25

0

Keep in mind that, when using memcpy different architectures and different compilers will apply padding and endianness differently. To prevent the padding of the struct you could use an attribute provided by GCC

__attribute__ ((packed))

Nevertheless, this does not protect you from alternating endiannes.
The code for serializing and deserializing using memcpy might look like this:

#include <memory>
#include <cstring>

struct __attribute__((packed)) AMG_ANGLES {
    float yaw;
    float pitch;
    float roll;
};

//The buffer is expected to be the same size as the T
template<typename T>
int serialize(const T &data,const std::unique_ptr<char[]> &buffer){
    std::memcpy(buffer.get(), &data, sizeof(T));
    return sizeof(T);
}

//The buffer is expected to be the same size as the ReturnType
template<typename ReturnType>
ReturnType deserialize(const std::unique_ptr<char[]> &buffer){
    ReturnType tmp;
    std::memcpy(&tmp, buffer.get(), sizeof(ReturnType));
    return tmp;
}

int main()
{
    struct AMG_ANGLES angles = {1.2, 1.3, 1.0};
    std::unique_ptr<char[]> buffer(new char[sizeof(struct AMG_ANGLES)]);
    int size = serialize(angles, buffer);
    struct AMG_ANGLES angles_serialized = deserialize<AMG_ANGLES>(buffer);
}

answered Jul 20 '22 at 13:25

Patrick Klaus

1

Welcome to SO! Your answer is generally correct, but it could be added that in the case of difference of current and desired (protocol-defined) endianness, resulting array can be reversed. Also, your `serialize` function returns `sizeof` of the input type which user could take directly (and as a constant expression) if they wanted, whereas the output buffer is an out parameter and you write to it in unchecked way, which can result in undefined behaviour if the buffer is too small. – YurkoFlisk Jul 23 '22 at 18:26
You could create `std::unique_ptr` for appropriately-sized dynamically allocated buffer directly in the function, write to it and return, or even use `std::array` to avoid dynamic allocation. – YurkoFlisk Jul 23 '22 at 19:06
Also, in `deserialize` there's no need to accept `const std::unique_ptr&` as buffer parameter, you could just accept `const char*` to be more general and avoid unnecessary indirection while accessing the buffer. See related [core guideline](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rr-uniqueptrparam). Same for `serialize` (`char*` parameter) if you want to stick to out buffer parameter (e.g. to be able to serialize into the middle of some buffer). – YurkoFlisk Jul 23 '22 at 19:09
Also note that in C++, unlike C, `struct`/`class` definitions/declarations introduce a type name in a namespace where they are declared by themselves, without `typedef`s needed (there's no "tag namespace" in C++). So you don't need to specifically write `struct AMG_ANGLES` to refer to the struct unless, however, you want to be sure that name lookup doesn't find a non-type (e.g. function) with the same name (see [here](https://stackoverflow.com/a/612476/3162368)). – YurkoFlisk Jul 23 '22 at 20:14

score -1 · Answer 5 · answered Feb 24 '20 at 08:11

It's better to make some kinds of class like std::stringstream.. std::stringstream is not good to save binary data but it works the same way you want. so I could make some example that works with std::stringstream..

This code implement only for serialization but it also add code for deserialization.

// C++11
template < typename T, typename decltype(std::declval<T>().to_string())* = nullptr>
    std::ostream& operator<< (std::ostream& stream, T&& val)
{
    auto str = val.to_string();
    std::operator <<(stream, str);
    return stream;
}

struct AMG_ANGLES {
    float yaw;
    float pitch;
    float roll;
    std::string to_string() const
    {
        std::stringstream stream;
        stream << yaw << pitch << roll;
        return stream.str();
    }
};

void Test()
{
    std::stringstream stream;
    stream << 3 << "Hello world" << AMG_ANGLES{1.f, 2.f, 3.f };
}

How to serialize structure data in C++?

5 Answers5