5

I expect from is_bitwise_serializable trait to serialize class like following (without serialize function):

class A { int a; char b; };
BOOST_IS_BITWISE_SERIALIZABLE(A);
A a{2, 'x'};
some_archive << a; // serializes a bitwisely

I wonder, why is there a need to provide serialize function for bitwise_serializable class?

Bikineev
  • 1,685
  • 15
  • 20

1 Answers1

5

From the documentation:

Some simple classes could be serialized just by directly copying all bits of the class. This is, in particular, the case for POD data types containing no pointer members, and which are neither versioned nor tracked. Some archives, such as non-portable binary archives can make us of this information to substantially speed up serialization.

To indicate the possibility of bitwise serialization the type trait defined in the header file is_bitwise_serializable.hpp is used:

Here are the key points: this optimization

  • is optional in the archive types where it does apply

  • doesn't apply to all archive types e.g.

    • a binary archive that needs to be portable could not be implemented by copying out the raw memory representation (because it is implementation and platform depenedent)

    • a text archive might not desire to optimize this (e.g. it has different goals to, like "human readable XML", they might not want to encode your vector<A> as a large bas64 encoded blob).

Note that this also explains that is_bit_wise_serializable<T> is not partially specialized for any type that has is_pod<T>::value == true (This could technically be done easily):

  • some classes might not be interested in serializing all their state (so using bitwise copy would take a lot more space than just selecting the interesting bits (pun intended))

You didn't ask, specifically, but this is what the working implementation would look like:

#include <boost/archive/binary_oarchive.hpp>
#include <boost/serialization/serialization.hpp>
#include <sstream>

struct A { int a; char b;
    template <typename Ar> void serialize(Ar& ar, unsigned) {
        ar & a;
        ar & b;
    }
};

BOOST_IS_BITWISE_SERIALIZABLE(A)

int main() {
    std::ostringstream oss;
    boost::archive::binary_oarchive oa(oss);

    A data { 1, 'z' };
    oa << data;
}

UPDATE

In response to the commenter who basically posed the same question again, I came up with a demonstration when bitwise serializability would (a) kick in making a visible difference (b) not resulting in the smaller archive:

Live On Coliru

#include <boost/archive/binary_oarchive.hpp>
#include <boost/serialization/array.hpp>
#include <boost/serialization/serialization.hpp>

#include <array>
#include <fmt/ranges.h>
#include <span>
#include <sstream>

struct A {
    int  a;
    char b;
    void serialize(auto& ar, unsigned) { ar& a& b; }
};

#ifdef BITWISE
BOOST_IS_BITWISE_SERIALIZABLE(A)
#endif

int main() {
    std::ostringstream oss;
    {
        boost::archive::binary_oarchive oa(oss,
                boost::archive::no_header |
                boost::archive::no_tracking |
                boost::archive::no_codecvt);

        std::array<A, 26> data{{
            {1, 'z'},  {2, 'y'},  {3, 'x'},  {4, 'w'},  {5, 'v'},  {6, 'u'},  {7, 't'},
            {8, 's'},  {9, 'r'},  {10, 'q'}, {11, 'p'}, {12, 'o'}, {13, 'n'}, {14, 'm'},
            {15, 'l'}, {16, 'k'}, {17, 'j'}, {18, 'i'}, {19, 'h'}, {20, 'g'}, {21, 'f'},
            {22, 'e'}, {23, 'd'}, {24, 'c'}, {25, 'b'}, {26, 'a'},
        }};

        oa << data;
    }

    auto raw = oss.str();
    fmt::print("raw serialized form {} bytes: {::#04x}\n", //
               raw.size(), std::vector(raw.begin(), raw.end()));
}

When built an run:

for def in REGULAR BITWISE; do g++ -std=c++20 -O2 -Wall -pedantic main.cpp -lboost_serialization -lfmt -o $def -D$def & done; wait
set -x; ./REGULAR; ./BITWISE

Prints

+ ./REGULAR
raw serialized form 148 bytes: [0x00, 0x00, 0x00, 0x00, 0x00, 0x1a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x7a, 0x02, 0x00, 0x00, 0x00, 0x79, 0x03, 0x00, 0x00, 0x00, 0x78, 0x04, 0x00, 0x00, 0x00, 0x77, 0x05, 0x00, 0x00, 0x00, 0x76, 0x06, 0x00, 0x00, 0x00, 0x75, 0x07, 0x00, 0x00, 0x00, 0x74, 0x08, 0x00, 0x00, 0x00, 0x73, 0x09, 0x00, 0x00, 0x00, 0x72, 0x0a, 0x00, 0x00, 0x00, 0x71, 0x0b, 0x00, 0x00, 0x00, 0x70, 0x0c, 0x00, 0x00, 0x00, 0x6f, 0x0d, 0x00, 0x00, 0x00, 0x6e, 0x0e, 0x00, 0x00, 0x00, 0x6d, 0x0f, 0x00, 0x00, 0x00, 0x6c, 0x10, 0x00, 0x00, 0x00, 0x6b, 0x11, 0x00, 0x00, 0x00, 0x6a, 0x12, 0x00, 0x00, 0x00, 0x69, 0x13, 0x00, 0x00, 0x00, 0x68, 0x14, 0x00, 0x00, 0x00, 0x67, 0x15, 0x00, 0x00, 0x00, 0x66, 0x16, 0x00, 0x00, 0x00, 0x65, 0x17, 0x00, 0x00, 0x00, 0x64, 0x18, 0x00, 0x00, 0x00, 0x63, 0x19, 0x00, 0x00, 0x00, 0x62, 0x1a, 0x00, 0x00, 0x00, 0x61]
+ ./BITWISE
raw serialized form 221 bytes: [0x00, 0x00, 0x00, 0x00, 0x00, 0x1a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x7a, 0x00, 0x00, 0x00, 0x02, 0x00, 0x00, 0x00, 0x79, 0x00, 0x00, 0x00, 0x03, 0x00, 0x00, 0x00, 0x78, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0x77, 0x7f, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00, 0x76, 0x7f, 0x00, 0x00, 0x06, 0x00, 0x00, 0x00, 0x75, 0x7f, 0x00, 0x00, 0x07, 0x00, 0x00, 0x00, 0x74, 0x7f, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x73, 0x7f, 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x72, 0x7f, 0x00, 0x00, 0x0a, 0x00, 0x00, 0x00, 0x71, 0x7f, 0x00, 0x00, 0x0b, 0x00, 0x00, 0x00, 0x70, 0x00, 0x00, 0x00, 0x0c, 0x00, 0x00, 0x00, 0x6f, 0x00, 0x00, 0x00, 0x0d, 0x00, 0x00, 0x00, 0x6e, 0x00, 0x00, 0x00, 0x0e, 0x00, 0x00, 0x00, 0x6d, 0x00, 0x00, 0x00, 0x0f, 0x00, 0x00, 0x00, 0x6c, 0x7f, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x6b, 0x00, 0x00, 0x00, 0x11, 0x00, 0x00, 0x00, 0x6a, 0x00, 0x00, 0x00, 0x12, 0x00, 0x00, 0x00, 0x69, 0x7f, 0x00, 0x00, 0x13, 0x00, 0x00, 0x00, 0x68, 0x00, 0x00, 0x00, 0x14, 0x00, 0x00, 0x00, 0x67, 0x00, 0x00, 0x00, 0x15, 0x00, 0x00, 0x00, 0x66, 0x7f, 0x00, 0x00, 0x16, 0x00, 0x00, 0x00, 0x65, 0x7f, 0x00, 0x00, 0x17, 0x00, 0x00, 0x00, 0x64, 0x7f, 0x00, 0x00, 0x18, 0x00, 0x00, 0x00, 0x63, 0x7f, 0x00, 0x00, 0x19, 0x00, 0x00, 0x00, 0x62, 0x00, 0x00, 0x00, 0x1a, 0x00, 0x00, 0x00, 0x61, 0x00, 0x00, 0x00]
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Why do you need `A::serialize()` in this particular example? What prevents bitwise serialization here? How could I know when `serialize()` is called and when bitwise serialization is used? – Evg Dec 10 '22 at 20:17
  • @Evg My answer goes in extreme detail explaining that. Many archives would not be able to perform the "optimization". Regardless, the optimization isn't always smaller. See the difference here: [regular serialization results in 148 bytes, bitwise in 221 bytes](http://coliru.stacked-crooked.com/a/96661f42e20a1230). – sehe Dec 10 '22 at 20:59
  • 1
    On the question "how could I know" - you _shouldn't know_. It is a mere optimization and the presence is an implementation detail. If you care, you should not be using it. You can, of course, just search for `is_bitwise_serializable` in the code base, which is what I did to come up with the example in the previous comment. – sehe Dec 10 '22 at 21:00
  • Thanks for the update! What are reasonable practical use cases for this optimization given one has no control over it and that it might turn out to be a pessimization? – Evg Dec 11 '22 at 00:17
  • 1
    For the cases where it _is_ an optimization. That can be either space (simplist edit: http://coliru.stacked-crooked.com/a/1b5c8f2feb7f2bec, which can add up) or speed. Imagine serializing large `dynamic_bitset` or `boost::gil::rgb8s_image_t` that is (many) megabytes of data. It will be way faster to do if you know the data is trivially copyable. *Always* measure for your use-case. Don't optimize unless you know the consequences. – sehe Dec 11 '22 at 16:07
  • The concept still looks somewhat vague to me. If I have a case when bitwise serialization has a significant measurable effect, I would probably prefer a mechanism over which I have a certain control so that I _know_ that that mechanism is actually used. Otherwise some code changes might break that optimization unpredictably without me noticing that until some runtime effects are observed. – Evg Dec 11 '22 at 16:53
  • @Evg It is vague. And that's by design. It was never about control, the library is about genericity (notably different archive providers, some of which will never be able to use bitwise copying). The library hides the complexity from you. That's the value. If you wanted more control (which wasn't the question here), use `serialization::binary_object` - which is designed for that purpose. Comparing three way (bitwise/binary_object/normal) across three archive types (Text, Binary, XML): http://coliru.stacked-crooked.com/a/bf39195040caed6a – sehe Dec 11 '22 at 19:33
  • If you use a third-party value to hide complexity for you, you are responsible for having tests around all the important guarantees that you need it to satisfy. *Or* you should refrain from upgrading. – sehe Dec 11 '22 at 19:34
  • Oh, I forgot to pack the struct in that three-way comparison, this actually gives a little more interesting perspective: http://coliru.stacked-crooked.com/a/57ca9d7d1c415257 – sehe Dec 11 '22 at 19:43
  • It would be helpful if someone could comment on the implications of this for boost::mpi. It seems consistent endianness between nodes is a compile time option. It is not clear however whether boost::mpi works bitwise by default or whether we should use the is_serializable macro on user-defined classes. – THK Jan 04 '23 at 22:19
  • @THK If you know your cluster and they are all same architecture/version you can use `BOOST_IS_BITWISE_SERIALIZABLE`. Actually pretty much the exact same conditions that would allow you to link code - it's about ABI compatibiliity – sehe Jan 04 '23 at 23:19
  • @sehe Thanks. I wish a bit more effort had gone into the boost::mpi documentation on some of these points. One has to dig into boost::serialization to find that eg vectors are bitwise by default as the data is cast to an array. Bitsets on the other hand are converted to strings. It is a bit hard to tell eg if I can use `BOOST_IS_BITWISE_SERIALIZABLE` on a struct with a bitset member. – THK Jan 05 '23 at 06:51
  • @THK whether you can is governed by the language + _your environment_. Here's how you figure it out without even touching Boost Serialization: http://coliru.stacked-crooked.com/a/edb45b9478e91646. The point is, this is not specific to Boost Serialization. As far as I know Boost MPI doesn't require/assume Boost Serialization either. – sehe Jan 05 '23 at 08:17
  • Also, it's not accurate that `std::vector` is serialized "are bitwise by default as the data is cast to an array". That depends on both the archive **and** the element type, as the earlier tester [very clearly demonstrated](http://coliru.stacked-crooked.com/a/57ca9d7d1c415257) – sehe Jan 05 '23 at 08:17
  • 1
    @sehe Perfect. Thanks. I can work from those hints. Yes, I was imprecise and should have said "is capable of bitwise copy" rather than "are bitwise by default". (And you got my pun, ha ha.) – THK Jan 06 '23 at 16:51