I found an interesting gotcha with protocol buffers. If you have two similar messages it is possible to parse one as if it were the other using the C++ API or the command line.
The limited documentation for ParseFromString does not mention that it need not consume all the string and will not fail if it doesn't.
I had expected ParseFromString to fail to parse a message of type A if it is presented with a message of type B. After all the message contains extra data. However, this is not the case. An example script demonstrates the issue:
#!/bin/sh
cat - >./foobar.proto <<EOF
syntax = "proto3";
package demo;
message A
{
uint64 foo = 1;
};
enum flagx {
y = 0;
z = 1;
}
message B {
uint64 foolish = 1;
flagx bar = 2;
};
EOF
cat - >./mess.B.in.txtfmt <<EOF
foolish: 10
bar: y
EOF
cat - >./mess.in.txtfmt <<EOF
foo: 10
EOF
protoc --encode=demo.A foobar.proto <./mess.A.in.txtfmt >./mess.A.proto
protoc --encode=demo.B foobar.proto <./mess.B.in.txtfmt >./mess.B.proto
protoc --decode=demo.A foobar.proto >./mess.out.txtfmt <./mess.B.proto
echo "in: "
cat mess.B.in.txtfmt
echo "out: "
cat mess.out.txtfmt
echo "xxd mess.A.proto:"
xxd mess.A.proto
echo "xxd mess.B.proto:"
xxd mess.B.proto
The output is:
in:
foolish: 10
bar: 20
out:
foo: 10
xxd mess.A.proto:
00000000: 080a
xxd mess.B.proto:
00000000: 080a
So the binary message is identical for both messages A and B.
If you alter the protocol so that instead of an enum you have another varint (uint64) you get distinct binary messages but ParseFromString will still successfully parse the longer message as the shorter one.
To really confuse things it also seems to be able to parse the shorter message as the longer one.
Is this a bug or a feature?