74

How to handle inheritance in Google Protocol Buffers 3.0?

Java equivalent code:

public class Bar {
    String name;
}
public class Foo extends Bar {
    String id;
}

What would be Proto equivalent code?

message Bar {
    string name = 1;
}
message Foo {
    string id = 2;
}
Vivek Sinha
  • 1,591
  • 3
  • 15
  • 23
  • 5
    Inheritance is not supported in protocol buffers. see this http://stackoverflow.com/questions/29263507/extending-protobuf-messages – Yousaf Dec 20 '16 at 13:21
  • Possible duplicate of [Extending Protobuf Messages](https://stackoverflow.com/questions/29263507/extending-protobuf-messages) – iammilind Aug 31 '18 at 06:46

2 Answers2

98

Protocol Buffers does not support inheritance. Instead, consider using composition:

message Foo {
  Bar bar = 1;
  string id = 2;
}

However, that said, there is a trick you can use which is like inheritance -- but which is an ugly hack, so you should only use it with care. If you define your message types like:

message Bar {
  string name = 1;
}
message Foo {
  string name = 1;
  string id = 2;
}

These two types are compatible, because Foo contains a superset of the fields of Bar. This means if you have an encoded message of one type, you can decode it as the other type. If you try to decode a Bar as type Foo, the field id will not be set (and will get its default value). If you decode a Foo as type Bar, the field id will be ignored. (Notice that these are the same rules that apply when adding new fields to a type over time.)

You can possibly use this to implement something like inheritance, by having several types all of which contain a copy of the fields of the "superclass". However, there are a couple big problems with this approach:

  • To convert a message object of type Foo to type Bar, you have to serialize and re-parse; you can't just cast. This can be inefficient.
  • It's very hard to add new fields to the superclass, because you have to make sure to add the field to every subclass and have to make sure that this doesn't create any field number conflicts.
Kenton Varda
  • 41,353
  • 8
  • 121
  • 105
  • Any idea how I would call the de/encoder from the Java API? – Janac Meena Jul 11 '19 at 17:35
  • Using Bar to decode Foo when using composition didn't really work for me. – gtato Mar 05 '20 at 16:40
  • @glato yes, that doesn't work with composition. It only works when one type contains a subset of the fields of the other type, as in my second example. – Kenton Varda Mar 06 '20 at 18:01
  • How would this work if `Foo` and `Bar` share a common field? – pooya13 Jul 13 '20 at 21:26
  • @pooya13 Foo's field will `hide` Bar's field of the same name. – Homunculus Reticulli Oct 21 '21 at 09:11
  • _"To convert a message object of type Foo to type Bar, you have to serialize and re-parse; you can't just cast. This can be inefficient."_ can you explain why one can't simply use something like `reinterpet_cast` (in C++) ? I don't understand why this will be inefficient and any different from "casting down" as per normal (in C++) – Homunculus Reticulli Oct 21 '21 at 09:13
  • A solution would be to use the methodology you suggest - but then to save the file not as a .proto file, but with a different extension, and then write a preprocessing script to autogenerate the code as you suggest. So the file to preprocess (say .pre.proto extension), could have syntax like message Foo : Bar { PARENT_LAST_ID_OFFSET = 123; /* new fields */} which will generate a .proto file with the format you suggested. I am thinking of doing this myself - but I first want clarification on the "inefficiency" you mentioned. – Homunculus Reticulli Oct 21 '21 at 09:27
  • @HomunculusReticulli The generated C++ classes for `Foo` and `Bar` are unrelated and have totally different memory layouts. You can't just `reinterpret_cast` between unrelated types, you'll get undefined behavior. – Kenton Varda Oct 21 '21 at 15:16
  • @KentonVarda Thanks for your response. Sorry for being pedantic - but can you explain why the memory layouts would be different, if the first N fields (when casting down) are all exactly the same data types - why would they be arranged differently in memory ? – Homunculus Reticulli Oct 22 '21 at 10:41
  • @HomunculusReticulli The C++ generated classes are more complicated than a simple list of members corresponding to the fields in the proto schema. For example, there's also a bit array which determines which fields are set, and there's other members not corresponding to a field that appear in every object. So, the members won't necessarily line up. Moreover, these are complex C++ classes with virtual methods and destructors, not simple structs. The language does not permit reinterpret_casting in this case even if one type's members are a prefix of the other's. – Kenton Varda Oct 22 '21 at 15:23
49

See the Protocol Buffer Basics tutorial:

Don't go looking for facilities similar to class inheritance, though – protocol buffers don't do that.

Andy Turner
  • 137,514
  • 11
  • 162
  • 243
  • 14
    Any idea why they don't do that? I mean, it doesn't sound like something they want to add, even in the future – Frederico Pantuzza Aug 04 '17 at 11:07
  • 2
    I *think* it can result in in-efficiencies. One of the core feature of porto-buffers is to pack things efficiently. The lower value of integer assigned to a variable ensures optimised packing and secondly they have to be unique. Now when multiple level of inheritance is created, it will be difficult to keep track of the integers assigned to properties in the base structures.. – Pankaj Garg Sep 14 '17 at 00:06
  • 3
    There is no need to add inheritance, it only serves to bring data-sets from one side to another side. If the one side or the other side want to do calculations that are build inheritance constructs, it has no consequences for the data to communicate. – Bert Verhees Jul 29 '18 at 12:12
  • 5
    Given most of the languages that Protocol Buffers has bindings for are OO and support inheritance it seems a bit of an oversight for Protobufs to leave out support for inheritance. That would be as monumentally stupid as EJB (built only for the OO Java language) leaving out support for inheritance ... – Volksman Jun 20 '21 at 09:13
  • 3
    I think the main reason it isn't done is because it increases the risk of id clashes, especially if there's a version mismatch and one side is using an older schema. And it's less obvious this is happening because the ids for the "base class" would be elsewhere, perhaps not even in the same file as the derived class. – Miral Aug 12 '21 at 07:21
  • 3
    @Volksman "most of the languages" so, _not all_ support inheritance. For example, go, one of Google's main production languages, doesn't. Since protocol buffers are the standard interchange format within Google, they are designed to use features available in the production languages. – Andy Turner Jan 26 '22 at 18:29