Luckily... C++ does not impose a default mechanism for serialization of a class hierarchy. (I wouldn't mind it supplying an optional mechanism supplied by a special base type in the standard library or something, but overall this could put limits on existing ABIs)
YES Serialization is incredibly important and powerful in modern software engineering. I use it any time I need to translate a class hierarchy to and from some form of runtime consumable data. The mechanism I always choose is based on some form of reflection. More on this below.
You may also want to look here for an idea of the complexities to consider and if you really wanted to verify against the standard you could purchase a copy here. It looks like the working draft for the next standard is on github.
Application specific systems
C++/C allow the author of the application the freedom to select the mechanics behind many of the technologies people take for granted with newer and often higher level languages. Reflection (RTTI), Exceptions, Resource/Memory Management (Garbage collection, RAII, etc.). These systems can all potentially impact the overall quality of a particular product.
I have worked on everything from real time games, embedded devices, mobile apps, to web applications and the overall goals of the particular project vary between them all.
Often for real time high performance games you will explicitly disable RTTI (it isn't very useful in C++ anyway to be honest) and possibly even Exceptions (Many people don't desire the overhead produced here either and if you were really crazy you could implement your own form from long jumps and such. For me Exceptions create an invisible interface that often creates bugs people wouldn't even expect to be possible, so I often avoid them anyway in favor of more explicit logic. ).
Garbage collection isn't included in C++ by default either and in real time games this is a blessing. Sure you can have incremental GC and other optimized approaches which I have seen many games use (often times it is a modification of an existing GC like that used in Mono for C#). Many games use pooling and often for C++ RAII driven by smart pointers. It isn't unusual to have different systems with different patterns of memory usage either which can be optimized in different ways. The point is some applications care more then others about the nitty gritty details.
General idea of automatic serialization of type hierarchy
The general idea of an automatic serialization system of type hierarchies is to use a reflection system that can query type information at runtime from a generic interface. My solution below relies on building that generic interface by extending upon some base type interfaces with the help of the macros. In the end you basically get a dynamic vtable of sorts that you can iterate by index or query by string names of members/types.
I also use a base reflection reader/writer type that exposes some iostream interfaces to allow derived formatters to override. I currently have a BinaryObjectIO, JSONObjectIO, and ASTObjectIO but it is trivial to add others. The point of this is to remove the responsibly of serializing a particular data format from the hierarchy and put it into the serializer.
Reflection at the language level
In many situations the application knows what data it would like to serialize and there is no reason to build it into every object in the language. Many modern languages include RTTI even in the basic types of the system (if they are type based common intrinsics would be int, float, double, etc.). This requires extra data to be stored for everything in the system regardless of the usage by the application. I'm sure many modern compilers can at times optimize away some with tree shaking and such, but you can't guarantee that either.
A Declarative approach
The methods already mentioned are all valid use cases, although they lack some flexibility by having the hierarchy handle the actual serialization task. This can also bloat your code with boilerplate stream manipulation on the hierarchy.
I personally prefer a more declarative approach via reflection. What I have done in the past and continue to do in some situations is create a base Reflectable type in my system. I end up using template metaprogramming to help with some boilerplate logic as well as the preprocessor for string concatenation macros. The end result is a base type that I derive from, a reflectable macro declaration to expose the interface and a reflectable macro definition to implement the guts (tasks like adding the registered member to the type's lookup table.).
So I normally end up with something that looks like this in the h:
class ASTNode : public Reflectable
{
...
public:
DECLARE_CLASS
DECLARE_MEMBER(mLine,int)
DECLARE_MEMBER(mColumn,int)
...
};
Then something like this in the cpp:
BEGIN_REGISTER_CLASS(ASTNode,Reflectable);
REGISTER_MEMBER(ASTNode,mLine);
REGISTER_MEMBER(ASTNode,mColumn);
END_REGISTER_CLASS(ASTNode);
ASTNode::ASTNode()
: mLine( 0 )
, mColumn( 0 )
{
}
I can then use the reflection interface directly with some methods such as:
int id = myreflectedObject.Get<int>("mID");
myreflectedObject.Set( "mID", 6 );
But much more commonly I just iterate some "Traits" data that I have exposed with another interface:
ReflectionInfo::RefTraitsList::const_iterator it = info->getReflectionTraits().begin();
Currently the traits object looks something like this:
class ReflectionTraits
{
public:
ReflectionTraits( const uint8_t& type, const uint8_t& arrayType, const char* name, const ptrType_t& offset );
std::string getName() const{ return mName; }
ptrType_t getOffset() const{ return mOffset; }
uint8_t getType() const{ return mType; }
uint8_t getArrayType() const{ return mArrayType; }
private:
std::string mName;
ptrType_t mOffset;
uint8_t mType;
uint8_t mArrayType; // if mType == TYPE_ARRAY this will give the type of the underlying data in the array
};
I have actually come up with improvements to my macros that allow me to simplify this a bit... but those are taken from an actual project I'm working on currently. I'm developing a programming language using Flex, Bison, and LLVM that compiles to C ABI and webassembly. I'm hoping to open source it soon enough, so if you are interested in the details let me know.
The thing to note here is that "Traits" information is metadata that is accessible at runtime and describes the member and is often much larger for general language level reflection. The information I have included here was all I needed for my reflectable types.
The other important aspect to keep in mind when serializing any data is version information. The above approach will deserialize data just fine until you start changing the internal data structure. You could, however, include a post and possibly pre data serialization hook mechanism with your serialization system so you can fix up data to comply with newer versions of types. I have done this a few times with setups like this and it works really well.
One final note about this technique is that you are explicitly controlling what is serialized here. You can pick and choose the data you want to serialize and the data that may just be keeping track of some transient object state.
C++ Lax guarantees
One thing to note... Since C++ is VERY lax about what data actually looks like. You often have to make some platform specific choices (this is probably one of the main reasons a standard system isn't provided). You can actually do a great deal at compile time with Template metaprogramming, but sometimes it is easier to just assume your char
to be 8 bits in length. Yes even this simple assumption isn't 100% universal in C++, luckily in most situations it is.
The approach I use also does some non-standard casting of NULL pointers to determine memory layout (again for my purposes this is the nature of the beast). The following is an example snippet from one of the macro implementations to calculate the member offset in the type where CLASS is provided by the macro.
(ptrType_t)&reinterpret_cast<ptrType_t&>((reinterpret_cast<CLASS*>(0))->member)
A general warning about reflection
The biggest issue with reflection is how powerful it can be. You can quickly turn an easily maintainable codebase into a huge mess with too much inconsistent usage of reflection.
I personally reserve reflection for lower level systems (primarily serialization) and avoid using it for runtime type checking for business logic. Dynamic dispatching with language constructs such as virtual functions should be preferred to reflection type check conditional jumps.
Issues are even harder to track down if the language has inherit all or nothing support for reflection as well. In C# for example you cannot guarantee, given a random codebase, that a function isn't being used simply by allowing the compiler to alert you of any usage. Not only can you invoke the method via a string from the codebase or say from a network packet... you also could break the ABI compatibility of some other unrelated assembly that reflects on the target assembly. So again use reflection consistently and sparingly.
Conclusion
There is currently no standard equivalent to the common paradigm of a serializable class hierarchy in C++, but it can be added much like any other system you see in newer languages. After all everything eventually translates down to simplistic machine code that can be represented by the binary state of the incredible array of transistors included in your CPU die.
I'm not saying that everyone should roll their own here by any means. It is complicated and error prone work. I just really liked the idea and have been interested in this sort of thing for a while now anyways. I'm sure there are some standard fallbacks people use for this sort of work. The first place to look for C++ would be boost as you mentioned above.
If you do a search for "C++ Reflection" you will see several examples of how others achieve a similar result.
A quick search pulled up this as one example.