Can anyone explain when to use protocol buffer instead of JSON for micro-services architecture? And vice-versa? Both on synchronous and asynchronous communication.
-
1why not flatbuffer , capNProto ? – so-random-dude Sep 20 '18 at 19:33
-
@so-random-dude Thanks for your answer. You can also discuss about the use-case, pros and cons of flatbuffer, capNProto in your answer – Kaidul Sep 20 '18 at 19:42
3 Answers
When to use JSON
- You need or want data to be human readable
- Data from the service is directly consumed by a web browser
- Your server side application is written in JavaScript
- You aren’t prepared to tie the data model to a schema
- You don’t have the bandwidth to add another tool to your arsenal
- The operational burden of running a different kind of network service is too great
Pros of ProtoBuf
- Relatively smaller size
- Guarantees type-safety
- Prevents schema-violations
- Gives you simple accessors
- Fast serialization/deserialization
- Backward compatibility
While we are at it, have you looked at flatbuffers?
Some of the aspects are covered here google protocol buffers vs json vs XML
Reference:
https://codeclimate.com/blog/choose-protocol-buffers/
https://codeburst.io/json-vs-protocol-buffers-vs-flatbuffers-a4247f8bda6f

- 15,277
- 10
- 68
- 113
-
You can get all these pros of ProtoBuf for JSON too with jsoniter-scala. When there are no too much float point numbers to serialize it can be quite competitive with best Java/Scala serializers for ProtoBuf: https://github.com/dkomanov/scala-serialization/pull/8 – Andriy Plokhotnyuk Sep 21 '18 at 11:20
-
2Note that some disadvantages of JSON can be addressed by validating JSON with JsonSchema. Namely: type-safety, schema-violations and part of backward compatibility issues. – Martin Grey Nov 06 '20 at 09:42
I'd use JSON when the consumer is or could possibly be written in a language with built-in native support for JSON (Javascript is an example), a web browser, or where human readability is wanted. Speaking of which, at least for asynchronous calls, many developers enjoy the convenience of examining the contents of the queue directly for debugging and even during the normal course of development. Depending on the tech stack used, it may or may not be worth the trade off to use protobuf just to reduce network load since any performance increase wont buy you much in the async world. And it's not like we need to write a bunch of boiler plate code anymore like we used to with JSON marshalling and unmarshalling in most languages.
I'd use protobuf for everything else... if there are any other use cases left for it with the considerations above. There are advantages you might see, such as performance, network load, the backwards compatibility offered by its versioning scheme, the lovely documentation that magically comes with proto files, and some validation! If for some reason you have a lot of REST or other synchronous calls between microservices, protobuf can be sent over the wire instead of JSON without many trade offs, if any at all, while offering a heap of advantages.

- 1,068
- 7
- 14
The general advantage of JSON (using OpenAPI) vs Protobuf (with GRPC) is JSON has a richer schema definition. (e.g. regex patterns, min, max to name a few.)
The main problem with JSON is the tooling. OpenAPI Generator provides a tool to generate stubs for the data. There is also Swagger CodeGen but they are IMO piss poor in terms of documentation and support.
With Protobuf (plus GRPC) you get smaller network footprint and a performant server implementation that gets generated for you. However, the .proto files do not have much in terms of schema definition. It is meant to be a binary data transfer protocol meant for machine to machine. Even if it can be sent over HTTP as binary, it just adds to the complexity because you'd need specialized tools to debug it.
In a microservices architecture, I'd use JSON for the following:
- anything going outside (i.e. HTTP requests/responses)
- Data stored in Kafka (makes thing much easier to debug)
- Data stored in MySQL (which has a JSON data type)
- (to generalize any place where data is looked at and is persisted to disk)
I'd use Protobuf + GRPC for
- remote procedure calls. This saves having to run a full HTTP stack, a GRPC server is much lighter.
- values stored in Redis. This is primarily because Redis stores all data in memory so it's best to keep it as small as possible. I emphasized values because it's still best to keep keys readable. But this is if you need multiple languages to read through the data. Otherwise use the fastest serialization mechanism you have.

- 35,625
- 19
- 175
- 265