62

I'm looking for a command-line utility that will, at a minimum, render binary protobuf data in human-readable form. Filtering and selection options (along the lines of cut for text) would be nice, but the primary object is to make the data visible for debugging purposes.

If there is no definitive tool for the job, links to relevant packages are fine.

comingstorm
  • 25,557
  • 3
  • 43
  • 67
  • 1
    Have a look at protoc command built in to protocol buffers, it has options to decode (--decode) a binary messages to text (and convert it back via the --encode option; probably will not work for java-delimited messages. There are utilities to convert pb to Xml / JSon. If using <= 2.6.1 there is https://sourceforge.net/projects/protobufeditor/ – Bruce Martin Jan 22 '16 at 23:14
  • Such a tool might [convert protocol-buffers to JSON](https://stackoverflow.com/questions/2544580/is-there-a-standard-mapping-between-json-and-protocol-buffers). – Raedwald Dec 12 '17 at 17:05

1 Answers1

84

The Protocol Compiler -- protoc -- has this functionality built-in via the --decode and --decode_raw flags. This is the same tool you use to generate code from a .proto file so is likely already installed.

For example:

protoc --decode_raw < message.bin

Or using the .proto file:

protoc --decode mypkg.MyType myschema.proto < message.bin

Here is the --help text:

Usage: protoc [OPTION] PROTO_FILES
Parse PROTO_FILES and generate output based on the options given:
  -IPATH, --proto_path=PATH   Specify the directory in which to search for
                              imports.  May be specified multiple times;
                              directories will be searched in order.  If not
                              given, the current working directory is used.
  --version                   Show version info and exit.
  -h, --help                  Show this text and exit.
  --encode=MESSAGE_TYPE       Read a text-format message of the given type
                              from standard input and write it in binary
                              to standard output.  The message type must
                              be defined in PROTO_FILES or their imports.
  --decode=MESSAGE_TYPE       Read a binary message of the given type from
                              standard input and write it in text format
                              to standard output.  The message type must
                              be defined in PROTO_FILES or their imports.
  --decode_raw                Read an arbitrary protocol message from
                              standard input and write the raw tag/value
                              pairs in text format to standard output.  No
                              PROTO_FILES should be given when using this
                              flag.
  -oFILE,                     Writes a FileDescriptorSet (a protocol buffer,
    --descriptor_set_out=FILE defined in descriptor.proto) containing all of
                              the input files to FILE.
  --include_imports           When using --descriptor_set_out, also include
                              all dependencies of the input files in the
                              set, so that the set is self-contained.
  --include_source_info       When using --descriptor_set_out, do not strip
                              SourceCodeInfo from the FileDescriptorProto.
                              This results in vastly larger descriptors that
                              include information about the original
                              location of each decl in the source file as
                              well as surrounding comments.
  --error_format=FORMAT       Set the format in which to print errors.
                              FORMAT may be 'gcc' (the default) or 'msvs'
                              (Microsoft Visual Studio format).
  --print_free_field_numbers  Print the free field numbers of the messages
                              defined in the given proto files. Groups share
                              the same field number space with the parent 
                              message. Extension ranges are counted as 
                              occupied fields numbers.
  --plugin=EXECUTABLE         Specifies a plugin executable to use.
                              Normally, protoc searches the PATH for
                              plugins, but you may specify additional
                              executables not in the path using this flag.
                              Additionally, EXECUTABLE may be of the form
                              NAME=PATH, in which case the given plugin name
                              is mapped to the given executable even if
                              the executable's own name differs.
  --cpp_out=OUT_DIR           Generate C++ header and source.
  --java_out=OUT_DIR          Generate Java source file.
  --python_out=OUT_DIR        Generate Python source file.
Kenton Varda
  • 41,353
  • 8
  • 121
  • 105
  • 2
    Right under my nose all the time. Thanks for the write-up, hopefully it will help out others as well. – comingstorm Jan 30 '16 at 01:22
  • Are there any workarounds/options to use the CLI/tool for parsing (Java) delimited messages, per chance? – decimus phostle May 26 '17 at 02:19
  • @decimusphostle Unfortunately, I don't think so. Support for delimited messages only recently landed in the C++ library after an extended period of foot-dragging by the maintainers. Presumably someone will need to send them a PR to add command-line support. – Kenton Varda Jun 01 '17 at 04:48
  • What is the output format of --decode here? It looks like json but it isn't quite. Json requires fields to be quoted so if I try to process the output using something like jq (https://stedolan.github.io/jq/) it doesn't work (see https://github.com/stedolan/jq/issues/1119). – Bruce Adams Dec 11 '17 at 17:42
  • 3
    It seems this is something undocumented called "text format" - see https://github.com/google/protobuf/issues/3755 – Bruce Adams Dec 11 '17 at 17:52
  • 2
    @BruceAdams The format actually predates JSON, if you can believe that. Differences from JSON include the quoting you mentioned as well as the fact that repeated fields are represented by literally having multiple instances of the field (rather than a list in square brackets). Contrary to what pherl said in your link, TextFormat implementations across languages *are* intended to be interoperable, or were in my time anyway. Here's the C++ API for it: https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.text_format – Kenton Varda Dec 12 '17 at 17:39