6

I am working on a project that makes heavy use of static polymorphism. A particular use-case that I am interested in would be made possible by static reflection, but we still don't have this in C++. The use case looks something like this: I have a functions that read/write a data structure to/from a binary file:

template <typename data_t> void write_binary(const my_type_t<data_t>& obj)
{
    //write a binary file...
}
template <typename data_t> void read_binary(my_type_t<data_t>& obj)
{
    //read a binary file...
}

I would like to enforce that I can only read data from files that were output by the same type, e.g. my_type_t<std::string> can only read from binary files output by my_type_t<std::string>, etc. The way I want to do this is to add a small header to the binary file that identifies the specialization of data_t:

template <typename data_t> void write_binary(const my_type_t<data_t>& obj)
{
    //write header type_name(data_t)
    //write a binary file...
}
template <typename data_t> void read_binary(my_type_t<data_t>& obj)
{
    //read header
    //assert header == type_name(data_t)
    //read a binary file...
}

I am aware of the existence of typeid(data_t).name() and the various methods of demangling it, but I want something that is defined by the standard.

So my precise question is this: for any two types type1_t and type2_t, is there any C++ standard-defined mapping "F" whatsoever such that F(type1_t) == F(type2_t) always implies type1_t == type2_t, and type1_t == type2_t always implies F(type1_t) == F(type2_t), independent of compiler? That is, is there any bijective mapping between types and some kind of serializable value defined by the c++ standard?

EDIT There is a subtlety in this question that I did not initially emphasize: I do not want to serialize arbitrary types. The function body that writes my objects to files is already implemented. What I want (as stated in the question above) is simply a unique identifier for every type that is compiler independent. The role of the template specialization data_t of my_type_t<data_t> is not to affect what information is written/read, but rather how it is interpreted.

Just a couple of other thematic points:

  • Due to the nature of my project, I cannot know ahead of type what type data_t will be, I must allow it to feasibly be anything.
  • It is very much undesirable for me to have to place requirements on what types can be used for the template specification, i.e. requiring people to implement some kind of "name" field for their types. This is because the final type data_t that ends up being used for the I/O is not tied to the interfaces that my users are exposed to.
  • While the details of how instances of types are stored in memory are indeed platform- and compiler-dependent, the names of the types themselves are ultimately properties only of the source code, not the compiler.
wvn
  • 624
  • 5
  • 12
  • Have you taken a look at https://stackoverflow.com/questions/234724/is-it-possible-to-serialize-and-deserialize-a-class-in-c ? – Frodyne Nov 04 '22 at 11:55
  • 2
    There is `typeid(type1_t).name()`. It has the exact properties that you ask for in your "precise question". But I think that is not what your really want. You want in addition that such a 1-to-1 correspondence also is present across different compilers. – j6t Nov 04 '22 at 11:58
  • Maybe this can help: https://en.cppreference.com/w/cpp/types/is_same – Patricio Loncomilla Nov 04 '22 at 11:59
  • 1
    if you know all the types you are going to handle, which you will presumably have to given that you will have to know how to serialize them, you could use your own IDs, call it MyID, by building a hash table mapping `typeid(T)` to/from MyId(T) – jwezorek Nov 04 '22 at 12:06
  • @j6t typeid(...).name() is compiler-dependent. – wvn Nov 04 '22 at 13:38
  • @jwezorek Implementing this construct upfront is exactly what I am trying to avoid, as there are infinitely many specializations that I could use for the output type. – wvn Nov 04 '22 at 13:40
  • @PatricioLoncomilla std::is_same checks two types, but there is no way to evaluate this at runtime, which is what I want to do. – wvn Nov 04 '22 at 13:41
  • @Frodyne this solution is, for all intents and purposes, equivalent to that proposed by jwezorek, requiring me to define upfront all types I want to serialize. – wvn Nov 04 '22 at 13:48
  • 2
    @wvn You are overengineering the problem. At any given time, your program has only a finite number of classes that it knows how to serialize. Problem solved. Ask yourself: what should the program do when it sees a class ID that it knows nothing about? It can only give up or reject the serialized stream. – j6t Nov 04 '22 at 14:14
  • @j6t I agree that if I were using this function as I described above, this would be overengineering the problem. However, there are two additional considerations for me here: 1. I am writing a template library that sees use by approximately 20 people, so I can't determine this collection of types upfront, and 2. The final type that is seen by the IO interface is not ultimately a type that is exposed to someone using my library -- in order for them in implement it themselves, they would have to have privileged information about the call structure that I don't want them to be responsible for. – wvn Nov 04 '22 at 14:26
  • Let's look at it from a low-level perspective. To get a compiler compatible standard compliant safe solution, you would not just dump RAM memory to disk, but really serialize into bytes. This serialization can be done hierarchically over members and you can use existing tools so that you do not have to create the serialization manually. But when reading back, the system somehow has to know, which class it was to check it. So your multitude of classes and template specializations, which possibly cannot be listed upfront, need some unique id. – Sebastian Nov 04 '22 at 14:32
  • 1
    @wvn I've used a serialization framework that hard-coded the program's class name in the serialization stream. It turned into a maintainance nightmare because suddenly we were limited in how we name the classes in the program. For example, a class serialized as `Foo` was later to be split into a worker class and a UI class, the latter being `Foo`. We wanted to delegate serialization to the worker class, but in order for the serialization to work, the worker class would have to have been named `Foo`, which we did not like. My advice: let your users decide on the name of the serialized class ID. – j6t Nov 04 '22 at 14:35
  • @Sebastian the irony of my situation is that I am not seeking a way of automatically serializing an instance of an obscure type - I am happy to provide the implementation of the serialization myself. In my case, the type specialization doesn't affect the information present in the stream, rather the way it is interpreted. All I need is some verification that it is the same type being read to as the stream was written by, nothing more than that. – wvn Nov 04 '22 at 14:42
  • 1
    To be more compatible, you could demangle the implementation provided type names: https://www.boost.org/doc/libs/1_63_0/libs/core/doc/html/core/demangle.html You mentioned, you are aware of it. But this seems to be the best bet, apart from manual ids and perhaps some "preprocessor macro magic". – Sebastian Nov 04 '22 at 14:42
  • @j6t This is why I am seeking something that is within the C++ standard itself: I see no reason why, given a solution to the problem posed above, I would suffer the same issue, save for a situation where I am reading old data and the type specification has changed -- a contingency I have already planned for. I can see this being a problem when using an external framework however... – wvn Nov 04 '22 at 14:45
  • 1
    This could also help. https://www.boost.org/doc/libs/develop/doc/html/boost_typeindex.html – Sebastian Nov 04 '22 at 14:50
  • 2
    [How about simply requiring each serialized type to define their own names?](https://coliru.stacked-crooked.com/a/b00a68af236f48f8) There's no need to list them up front in sourcecode, only in the binary. – Mooing Duck Nov 04 '22 at 15:22
  • Just to understand the scope of the question even better: The type uid should be enough between program runs. So a memory address of a static class member is not enough? Could the program also be recompiled with different source? Or with introduction and deletion of types or even introduction of types in two different source code branches independently? – Sebastian Nov 04 '22 at 16:49
  • How are the types identified? By class name and namespace or by the directory and filename of the translation unit or can/do have all involved types a fixed constant? E.g. a templated custom container itself and the template parameter class both have an id, which you could combine? Or is their some runtime registry? Do you have influence on all classes involved, i.e. can you demand one additional common base class or a macro to be called within or at every involved class definition? Do you know more about (or a list of all of them) at runtime? – Sebastian Nov 04 '22 at 16:52
  • Can you restrict (as long as you detect) the loading by another/newer executable (e.g. when source code changed, but accepting the same source code version compiled by different compilers? Can you introduce a custom step in the build and/or link process? – Sebastian Nov 04 '22 at 16:56
  • How deep are the relevant "class levels"? One template class + one possibly new class parameter to the template, or much more involved? Can you shortly describe more of where the serialization happens to have an idea, how much is anyway manually done per class? E.g. one switch-case block or a runtime class registry or customization or polymorphism in this case by inheritance? – Sebastian Nov 04 '22 at 16:59
  • Not really a boilerplate-solution, but perhaps [this](https://www.codeproject.com/articles/18032/how-to-marshal-a-c-class) helps you. – user1934428 Nov 29 '22 at 12:11

2 Answers2

2

The meaning of a type only makes sense within modules, that use the same compiler. So you could just go with typeid(...).name() or typeid(...).hash_code() which are standard functions.

If you have different compilers (and/or different platforms) involved, there is no possibility to reconstruct an arbitrary type from the other compiler. The memory layout could be completely different. Even an int could be different in size or have a different byte ordering. How would the information that it is an int help you anyway?

You should write your library in a way that the users have customization points to serialize and deserialize their types. You simply cannot write generic read_binary and write_binary functions in a portable way.

Jakob Stark
  • 3,346
  • 6
  • 22
  • A subtlety that I didn't emphasize enough in my original post is that I am not interested in the process of serializing and deserializing arbitrary types -- I will make a point to edit my orginal post to hit this harder. All I want is to assert that the same types were written to a stream. The information in the stream is always written in the same way, as defined by myself. The role of the specialization is to differentiate how the information is interpreted. – wvn Nov 04 '22 at 15:38
  • Additionally, while typeid(...).name() and typeid(...).hash_code() are indeed standard functions, the values they return are not standard. – wvn Nov 04 '22 at 15:56
1

No.

Nor does the problem seem to benefit from one. Serialization is not generically possible in C++, so you will have customization points whether you implement them or your user does to serialize and deserialize and they will be type-specific. In other words, in:

template <typename data_t> void write_binary(const my_type_t<data_t>& obj)
{
    //write header type_name(data_t)
    //write a binary file...
}

The write a binary file has to be specific to data_t. There have to be cases to write a std::string differently than an int. Each of those cases can prepend an identifying header if they want. The deserialization can check that header. The deserialization can also check other invariants of the type.

requiring people to implement some kind of "name" field for their types

A customization point doesn't require a particular field. There are ways to allow customization of behavior non-intrusively such as template specialization (traits) and ADL (overloading).

the names of the types themselves are ultimately properties only of the source code

The types are a property of the source code. The names, and the spelling, are a choice of a particular formatting of the types. type_id(x).name() is one choice of formatting, which will differ on different compilers. A demangled name is another, which will differ on different platforms. Demangled names are not necessarily unique.

(Finally, using type names to identify the serialized value is cute but likely to yield surprises. For example, one would generally expect to be able to rename a class type without affecting serialized data. One would generally expect to move it to a new namespace, even with a typedef in the old location for minimal impact, without affecting serialized data.)

Jeff Garrett
  • 5,863
  • 1
  • 13
  • 12
  • 1
    I think "no" is unfortunately the answer that I am looking for here. I don't agree that my specific case doesn't benefit from it - my question as posed above is a trivialization of the problem, and without a more complete picture it is impossible to determine whether or not this is the solution I am after. – wvn Nov 07 '22 at 10:51