9

I am defining a ProtoBuf message where I want to have a "nullable" field -- i.e., I want to distinguish between the field having a value and not having a value. As a concrete example, let's say I have "x" and "y" fields to record the coordinates of some object. But in some cases, the coordinates are not known. The following definition will not work, because if x or y are unspecified, then they default to zero (which is a valid value):

message MyObject {
    optional float x = 1;
    optional float y = 2;
}

One option would be to add a boolean field recording whether the corresponding field's value is known or not. I.e.:

message MyObject {
    optional bool has_x = 1; // if false, then x is unknown.
    optional bool has_y = 2; // if false, then y is unknown.
    optional float x = 3; // should only be set if has_x==true.
    optional float y = 4; // should only be set if has_y==true.
}

But this imposes some extra book-keeping -- e.g., when I set the x field's value, I must always remember to also set has_x. Another option would be to use a list value, with the convention that the list always has either length 0 or length 1:

message MyObject {
    repeated float x = 1; // should be empty or have exactly 1 element.
    repeated float y = 2; // should be empty or have exactly 1 element.
}

But in this case, the definition seems a bit misleading, and the interface isn't much better.

Is there a third option that I haven't thought of that's better than these two? How have you dealt with storing nullable fields in protobuf?

mkjeldsen
  • 2,053
  • 3
  • 23
  • 28
Edward Loper
  • 15,374
  • 7
  • 43
  • 52
  • 1
    There's [a Proto 3 version of this question](https://stackoverflow.com/questions/42622015/how-to-define-an-optional-field-in-protobuf-3), if others find this but are using Proto 3. – chwarr Feb 14 '19 at 01:13

2 Answers2

7

Protobuf 2 messages have a built-in notion of "nullable fields". The C++ interface contains methods has_xxx and clear_xxx to check if the field has been set and to unset the field, respectively.

This feature comes "for free" due to the way fields are encoded in message using "tags". An unset field is simply "not present" in the encoded message.

Proto 3 does not have this feature, instead setting any missing field to its default value.

mkjeldsen
  • 2,053
  • 3
  • 23
  • 28
JesperE
  • 63,317
  • 21
  • 138
  • 197
  • Are you sure? In particular, if I encode a message and send it over the wire, will I be able to distinguish an object that had its field explicitly set to the default value from an object whose field was never set to any value? See my related question: http://stackoverflow.com/questions/9168052/how-do-has-field-methods-relate-to-default-values-in-protobuf – Edward Loper Feb 08 '12 at 16:29
  • 2
    Yes. You should read about how protobuf messages are encoded: http://code.google.com/apis/protocolbuffers/docs/encoding.html. Messages are encoded as a series of key/value pairs (that is what the identifier/tag on each field is for). Unset fields are simply absent from the encoded message. If you have three fields, a, b, and c with tags 1, 2, and 3 respectively, and you encode a message with only "a" set to "42", then the encoded message will contain "field(1) = 42", and nothing else. – JesperE Feb 09 '12 at 07:29
  • 5
    JesperE's answer may have been true in the past, but it's definitely not true anymore. – jordanbtucker Feb 26 '16 at 09:35
2

Have a notion of NaN for each of the types and then use default (as shown below) to set it as the value. This will be used if nothing is specified for that particular field.

optional float x = 1 [default = -1];
Aravind Yarram
  • 78,777
  • 46
  • 231
  • 327
  • 6
    This only works if there is some non-meaningful value within the domain of the type. E.g., for a coordinate, -1 is a perfectly valid value. There is no value within the domain of "float" that couldn't be a real coordinate. – Edward Loper Feb 08 '12 at 16:27