1

Since HTTP is a text protocol I assume that for all mime types the HTTP body typically consists of text. This would mean that for JSON all numbers would be represented as text with 1 byte for each character instead of 8 bytes flat.

E.g. for transmitting this JSON:

{ num: 0.123456789 }

It would transmit 11 bytes alone for the number value.

Is this correct or are there optimized byte representations for different mime types and in particular JSON in HTTP?

Nick Russler
  • 4,608
  • 6
  • 51
  • 88
  • FWIW, HTTP bodies can be binary as well (without any extra encoding/wrapping). That is how images are sent, for example. – Thilo Aug 26 '19 at 13:21
  • And yes, JSON is sent as UTF-8-encoded text. So your number takes 11 bytes. If you don't like that, look at compact encodings such as MessagePack (see https://stackoverflow.com/q/4893161/149550). But it's rarely worth the trouble unless you have big amounts of data. – Thilo Aug 26 '19 at 13:23
  • @Thilo It might not seem much on a personal scale, but in a global scope its an enormous waste.. – Nick Russler Aug 26 '19 at 13:25
  • Would be nice if the Browsers could signal support for some sort of binary json, e.g. via a new "compression" and the server would just send in a more optimized byte representation. – Nick Russler Aug 26 '19 at 13:34
  • Browsers do support gzip-compression which helps a lot with JSON, too. – Thilo Aug 27 '19 at 05:30
  • @Thilo Yes I found some results which said the same by [comparing gzipped json vs protbuf](https://nilsmagnus.github.io/post/proto-json-sizes/#gzipped-json-and-gzipped-protobuf). I would not have thought that the compression works so well vs an optimized serialization. – Nick Russler Aug 27 '19 at 11:53

3 Answers3

2

HTTP is a text base protocol, but this is mostly about the Headers part of the message. Then the headers should define the body size (via Content-Length for example, of with the 'Transfer-Encoding: chunked` mode which is a little more complex), an this size is a byte size.

The body content can contain any byte, even the NULL byte if you want, anythings, for an HTTP agent the body is juste a n byte long blob.

This body can even be compressed (via gzip or deflate), and this information is then stored on the headers also.

So there is no problem for transmitting you json in UTF-8 or any other non ascii7 format (like all the iso-* ones).

In terms of size of the body, something like UTF-8 would not make 'every byte bigger', because simple stuff like the digits are in fact 1-byte long characters, even in UTF-8. If you wonder about size of your message the really important setting is the compression format that the HTTP server could apply on the body.

regilero
  • 29,806
  • 6
  • 60
  • 99
  • I did not mean that UTF-8 would bloat up ASCII text, I meant that a double precision number (as all numbers in JSON) is not represented in 8 bytes but as many bytes as characters are there in a string representation of this number. – Nick Russler Aug 26 '19 at 14:10
  • I just read that "JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8" ([source](https://tools.ietf.org/html/rfc8259#section-8.1)), which I think implies that serialized JSON numbers are also to be as text in UTF-8. – Nick Russler Aug 26 '19 at 14:13
  • UTF-8 mans nothing about being text or not, it's about the internal 0110011 representation of characters. Everything is a character, even a number. Encoding is something low level, types in json is very high level. – regilero Aug 27 '19 at 09:21
  • Yes, but I wanted to know if the byte representation of JSON numbers in-flight via HTTP is done as UTF-8 encoded characters or as e.g. IEEE 754 floats. – Nick Russler Aug 27 '19 at 11:50
  • OK, and the solution is that there is no tricky bytes stuff with Json, Json is a text base format, using UTF-8. And if we go deeper, in the HTTP layer, HTTP does not care about the encoding of the body, or at least if you want it to be decoded with the right encoding on the other side of the message you should add he right `Content-Encoding` header, else the HTTP client trying to read it may use a wrong encoding and fail the body interpretation. – regilero Aug 27 '19 at 13:49
2

HTTP can send binary data just fine, and there's 2 ways in particular to optimize this:

  1. You can switch to a binary encoding that's not JSON but largely compatible with JSON. CBOR is one example.
  2. You can gzip or brotli-compress the JSON. Browsers support this transparently.

Option 2 is by far the easiest and actually gives you a great bang for your buck. But option 1 usually wins in terms of efficiency of sending bytes and can be combined with 2.

Evert
  • 93,428
  • 18
  • 118
  • 189
1

I suggest you send them as string; it will cost you 1byte for character and it won't be affected by other system precision.

  • Thanks for your answer, I wasn't actually looking for the solution of any particular problem but wanted to make sure I understood the protocol correctly. – Nick Russler Aug 26 '19 at 13:26
  • But why do you need such a small number? If you're not using BigDecimal (https://stackoverflow.com/a/3413493/11951081) is quite difficult to keep that kind of precision. – Ludovico Sidari Aug 26 '19 at 13:32
  • Its not about precision, I just wondered if there is really so much memory, bandwidth and ultimately energy wasted on the representation of json as text. – Nick Russler Aug 26 '19 at 13:33
  • I usually use string on numbers which doesn't require mathematical operations or which may change during several steps. – Ludovico Sidari Aug 26 '19 at 13:36