7

I been trying to encode strings using protoc cli utility. Noticed that output still contains plain text. What am i doing wrong?

osboxes@osboxes:~/proto/bin$ cat ./teststring.proto
syntax = "proto2";
message Test2 {
  optional string b = 2;
}

echo b:\"my_testing_string\"|./protoc --encode Test2 teststring.proto>result.out

result.out contains:

^R^Qmy_testing_string

protoc versions libprotoc 3.6.0 and libprotoc 2.5.0

kpavel
  • 262
  • 3
  • 9
  • 1
    Are you sure it isn't working fine? Displaying it to the console is bound to cause problems - the console is text, not binary. But pipe it to a file, and it'll probably be right. You can test at https://protogen.marcgravell.com/decode - just upload your test file there and see what it makes of it – Marc Gravell Jun 28 '18 at 18:21
  • @MarcGravell I think it was exactly what i did in the example above... Piping the output of encoding to the file result.out – kpavel Jun 28 '18 at 18:26
  • Ok, my mistake. Now: you're displaying text - what are the *hex* of that file? Looking at it as text is doomed to failure. Note that since protobuf encodes strings as utf8, it is expected that your text appears "as is". What is interesting to me here is the first 6 (or so) bytes of the file. As hex, not as characters. – Marc Gravell Jun 28 '18 at 18:28
  • Note - I think the decode page above will display the hex if you upload a file – Marc Gravell Jun 28 '18 at 18:29
  • @MarcGravell Thanks! You right. The decode page above indeed shows expected hex. – kpavel Jun 28 '18 at 19:09
  • I was under impression that protobuf encoding has sort of compression which makes the transferred binary messages much smaller when comparing to JSON – kpavel Jun 28 '18 at 20:21
  • that isn't *compression* as such - just an efficient framing protocol. But yes, protobuf will always be smaller than JSON - I don't think there's any scenario in which JSON can be smaller for *any* field. Plus it is computationally efficient, too. Compare the json: `{"b":2}` - that's 7 bytes, and probably much more if you have real names (not just `b`). In protobuf that would usually be 2 bytes: 1 byte for the field header and data type tag, one byte for the value encoded as "varint". Additionally, JSON decoder has lots of text parsing to do - much more intensive than a dense binary protocol. – Marc Gravell Jun 28 '18 at 23:14

2 Answers2

3

Just to formalize in an answer:

The command as written should be fine; the output is protobuf binary - it just resembles text because protobuf uses utf-8 to encode strings, and your content is dominated by a string. However, despite this: the file isn't actually text, and you should usually use a hex viewer or similar if you need to inspect it.

If you want to understand the internals of a file, https://protogen.marcgravell.com/decode is a good resource - it rips an input file or hex string following the protocol rules, and tells you what each byte means (field headers, length prefixes, payloads, etc).

I'm guessing your file is actually:

(hex) 10 11 6D 79 5F etc

i.e. 0x10 = "field 2, length prefixed", 0x11 = 17 (the payload length, encoded as varint), then "my_testing_string" encoded as 17 bytes of UTF8.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
2
protoc --proto_path=${protobuf_path} --encode=${protobuf_message} ${protobuf_file} < ${source_file} > ${output_file}

and in this case:

protoc --proto_path=~/proto/bin --encode="Test2" ~/proto/bin/teststring.proto < ${source.txt} > ./output.bin

or:

cat b:\"my_testing_string\" | protoc --proto_path=~/proto/bin --encode="Test2" ~/proto/bin/teststring.proto > ./output.bin
bb7bb
  • 21
  • 3