104

Google Protocol Buffers can not only be serialized in binary format, also be serialized as text, known as textproto. However I can't easily find examples of such text; what would it look like?

Expected answer: an example covering all features allowed by the protobuf IDL/proto file including a sample protobuf packet in textual form.

Colonel Panic
  • 132,665
  • 89
  • 401
  • 465
Vi.
  • 37,014
  • 18
  • 93
  • 148
  • 26
    Vague benetits of answering this/such questions: 0. My curiosity satisfied; 1. More people knowing about this debugging tool; 2. Other people recognizing some previously unidentifiable text chunk as protobuf-text; 3. A step to adding a sample in the official documentation linked here (where it should have been in the first place); 4. Google search for "protobuf text sample" landing here and providing useful result... – Vi. Sep 18 '13 at 13:55
  • 2
    Another possible benefit is the applicability of traditional unix tools, like `grep` and `awk`. For example if someone stores metadata about files in protobuf and wants to provide a way to grep through that metadata, it is easy to implement it by only adding a proto2text converter filter into the pipeline. – beemtee Mar 10 '19 at 10:20
  • 1
    One of the design goals of the protobuf text output was to be amenable to line-by-line diff tools. – Paul Feb 15 '21 at 01:12
  • 1
    Please vote for this request for documentation https://github.com/protocolbuffers/protobuf/issues/3755 – Colonel Panic Jan 18 '22 at 17:41
  • Note that since proto3 there is an official json mapping https://developers.google.com/protocol-buffers/docs/proto3#json – Ben Page May 23 '22 at 22:29

3 Answers3

115

Done myself:

test.proto

enum MyEnum
{
    Default = 0;
    Variant1 = 1;
    Variant100 = 100;
}

message Test {
    required string f1 = 1;
    required int64 f2 = 2;
    repeated uint64 fa = 3;
    repeated int32 fb = 4;
    repeated int32 fc = 5 [packed = true];
    repeated Pair pairs = 6;
    optional bytes bbbb = 7;

    extensions 100 to max;
}

message Pair {
    required string key = 1;
    optional string value = 2;
}



extend Test {
    optional bool gtt = 100;
    optional double gtg = 101;
    repeated MyEnum someEnum = 102;
}

example output:

f1: "dsfadsafsaf"
f2: 234
fa: 2342134
fa: 2342135
fa: 2342136
fb: -2342134
fb: -2342135
fb: -2342136
fc: 4
fc: 7
fc: -12
fc: 4
fc: 7
fc: -3
fc: 4
fc: 7
fc: 0
pairs {
  key: "sdfff"
  value: "q\"qq\\q\n"
}
pairs {
  key: "   sdfff2  \321\202\320\265\321\201\321\202 "
  value: "q\tqq<>q2&\001\377"
}
bbbb: "\000\001\002\377\376\375"
[gtt]: true
[gtg]: 20.0855369
[someEnum]: Variant1

the program:

#include <google/protobuf/text_format.h>

#include <stdio.h>
#include "test.pb.h"

int main() {
    Test t;
    t.set_f1("dsfadsafsaf");
    t.set_f2(234);
    t.add_fa(2342134);
    t.add_fa(2342135);
    t.add_fa(2342136);
    t.add_fb(-2342134);
    t.add_fb(-2342135);
    t.add_fb(-2342136);
    t.add_fc(4);
    t.add_fc(7);
    t.add_fc(-12);
    t.add_fc(4);
    t.add_fc(7);
    t.add_fc(-3);
    t.add_fc(4);
    t.add_fc(7);
    t.add_fc(0);
    t.set_bbbb("\x00\x01\x02\xff\xfe\xfd",6);

    Pair *p1 = t.add_pairs(), *p2 = t.add_pairs();
    p1->set_key("sdfff");
    p1->set_value("q\"qq\\q\n");
    p2->set_key("   sdfff2  тест ");
    p2->set_value("q\tqq<>q2&\x01\xff");

    t.SetExtension(gtt, true);
    t.SetExtension(gtg, 20.0855369);
    t.AddExtension(someEnum, Variant1);

    std::string str;
    google::protobuf::TextFormat::PrintToString(t, &str);
    printf("%s", str.c_str());

    return 0;
}

Binary protobuf of this sample (for completeness):

00000000  0a 0b 64 73 66 61 64 73  61 66 73 61 66 10 ea 01  |..dsfadsafsaf...|
00000010  18 f6 f9 8e 01 18 f7 f9  8e 01 18 f8 f9 8e 01 20  |............... |
00000020  8a 86 f1 fe ff ff ff ff  ff 01 20 89 86 f1 fe ff  |.......... .....|
00000030  ff ff ff ff 01 20 88 86  f1 fe ff ff ff ff ff 01  |..... ..........|
00000040  2a 1b 04 07 f4 ff ff ff  ff ff ff ff ff 01 04 07  |*...............|
00000050  fd ff ff ff ff ff ff ff  ff 01 04 07 00 32 10 0a  |.............2..|
00000060  05 73 64 66 66 66 12 07  71 22 71 71 5c 71 0a 32  |.sdfff..q"qq\q.2|
00000070  23 0a 14 20 20 20 73 64  66 66 66 32 20 20 d1 82  |#..   sdfff2  ..|
00000080  d0 b5 d1 81 d1 82 20 12  0b 71 09 71 71 3c 3e 71  |...... ..q.qq<>q|
00000090  32 26 01 ff 3a 06 00 01  02 ff fe fd a0 06 01 a9  |2&..:...........|
000000a0  06 ea 19 0c bf e5 15 34  40 b0 06 01              |.......4@...|
000000ac

Note that it's the sample is not completely OK: libprotobuf ERROR google/protobuf/wire_format.cc:1059] Encountered string containing invalid UTF-8 data while parsing protocol buffer. Strings must contain only UTF-8; use the 'bytes' type for raw bytes.

Note that protoc tool also can decode messages to text, both with the proto file and without:

$ protoc --decode=Test test.proto < test.bin 
[libprotobuf ERROR google/protobuf/wire_format.cc:1091] String field 'value' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes. 
f1: "dsfadsafsaf"
f2: 234
fa: 2342134
fa: 2342135
fa: 2342136
fb: -2342134
fb: -2342135
fb: -2342136
fc: 4
fc: 7
fc: -12
fc: 4
fc: 7
fc: -3
fc: 4
fc: 7
fc: 0
pairs {
  key: "sdfff"
  value: "q\"qq\\q\n"
}
pairs {
  key: "   sdfff2  \321\202\320\265\321\201\321\202 "
  value: "q\tqq<>q2&\001\377"
}
bbbb: "\000\001\002\377\376\375"
[gtt]: true
[gtg]: 20.0855369
[someEnum]: Variant1
$ protoc --decode_raw  < test.bin 
1: "dsfadsafsaf"
2: 234
3: 2342134
3: 2342135
3: 2342136
4: 18446744073707209482
4: 18446744073707209481
4: 18446744073707209480
5: "\004\007\364\377\377\377\377\377\377\377\377\001\004\007\375\377\377\377\377\377\377\377\377\001\004\007\000"
6 {
  1: "sdfff"
  2: "q\"qq\\q\n"
}
6 {
  1: "   sdfff2  \321\202\320\265\321\201\321\202 "
  2: "q\tqq<>q2&\001\377"
}
7: "\000\001\002\377\376\375"
100: 1
101: 0x403415e5bf0c19ea
102: 1
Vi.
  • 37,014
  • 18
  • 93
  • 148
  • 16
    If your purpose is to educate, you could mention the --encode/--decode options of the protoc command. These should convert between bin / text formats. There is also the --decode_raw option as well. – Bruce Martin Sep 18 '13 at 22:20
  • 4
    Fantastic answer. Before seeing this it was really difficult for me to picture how data was encoded via proto buffers, and even though it doesn't really make sense to pass data as a string as shown here, I now have a good enough mental model of how data is encoded in the binary format. Thanks! – Chris Calo Nov 04 '15 at 21:58
  • 4
    I've discovered that protobufs supports a shortcut for entering in repeated values. fc values could be entered as fc: [4, 7, -12, 4, 7, -3, 4, 7, 0] – WiiBopp Mar 16 '17 at 23:11
  • Could you please provide a Makefile and the .h file? – silvalli Jan 24 '19 at 08:15
  • @silvalli, It is. Do you suggest redoing this part to make it cleaner? – Vi. Jan 24 '19 at 13:43
  • Yes. It would be very useful to have the additional pieces so we can play with this. You mention the error but could you say more about it? I don't understand what you mean by "it is". Thanks – silvalli Jan 25 '19 at 21:49
  • "It is" meant "Yes, it does exist.". Edit requests updating all four samples to get rid of the warning are welcome. I may eventually do it myself if/when I play with Protobuf again someday. – Vi. Jan 28 '19 at 00:54
18

Simplified, output from protoc.exe version 3.0.0 on window7 + cygwin

Demo message

$ cat demo.proto
   syntax = "proto3";  package demo;  message demo { repeated int32 n=1; } 

Create a protobuf binary data

$ echo n : [1,2,3]  | protoc --encode=demo.demo demo.proto > demo.bin

Dumping proto data as text

 $ protoc --decode=demo.demo demo.proto < demo.bin
    n: 1
    n: 2
    n: 3

And dump even if you don't have the proto definiton

    $ protoc --decode_raw < demo.bin  
    1: "\001\002\003" 
mosh
  • 323
  • 2
  • 8
6

Update 2022: official docs published at https://developers.google.com/protocol-buffers/docs/text-format-spec

An example from an open-source repo https://github.com/google/nvidia_libs_test/blob/master/cudnn_benchmarks.textproto

convolution_benchmark {
  label: "NHWC_128x20x20x56x160"
  input {
    dimension: [128, 56, 20, 20]
    data_type: DATA_HALF
    format: TENSOR_NHWC
  }
}

More examples across GitHub https://github.com/search?q=extension%3Atextproto https://github.com/search?q=extension%3Apbtxt

Colonel Panic
  • 132,665
  • 89
  • 401
  • 465