28

I've just read an article about differences between http1 and http2. The main question that I have is when it says that http2 is a binary protocol but http1 is a textual protocol.

Maybe I'm wrong but I know that any data, text or whatever format it can be, has a binary representation form in memory, and even when transfer through TCP/IP network the data is split to a format according with the layer of the OSI model or the TCP/IP model representation which means that technically textual format doesn't exist in the context of data transfer through network.

I cannot really understand this difference between http2 and http1, can you help me please with a better explanation?

John Smith
  • 835
  • 1
  • 7
  • 19
Christian LSANGOLA
  • 2,947
  • 4
  • 22
  • 36

4 Answers4

64

Binary is probably a confusing term - everything is ultimately binary at some point in computers!

HTTP/2 has a highly structured format where HTTP messages are formatted into packets (called frames) and where each frame is assigned to a stream. HTTP/2 frames have a specific format, including a length which is declared at the beginning of each frame and various other fields in the frame header. In many ways it’s like a TCP packet. Reading an HTTP/2 frame can follow a defined process (the first 24 bits are the length of this packet, followed by 8 bits which define the frame type... etc.). After the frame header comes the payload (e.g. HTTP Headers, or the Body payload) and these will also be in a specific format that is known in advance. An HTTP/2 message can be sent in one or more frames.

By contrast HTTP/1.1 is an unstructured format made up of lines of text in ASCII encoding - so yes this is transmitted as binary ultimately, but it’s basically a stream of characters rather than being specifically broken into separate pieces/frames (other than lines). HTTP/1.1 messages (or at least the first HTTP Request/Response line and HTTP Headers) are parsed by reading in characters one at a time, until a new line character is reached. This is kind of messy as you don’t know in advance how long each line is so you must process it character by character. In HTTP/1.1 the HTTP Body’s length is handled slightly different as typically is known in advance as a content-length HTTP header will define this. An HTTP/1.1 message must be sent in its entirety as one continuous stream of data and the connection can not be used for anything else but transmitting that message until it is completed.

The advantage that HTTP/2 brings is that, by packaging messages into specific frames we can intermingle the messages: here’s a bit of request 1, here’s a bit of request 2, here’s some more of request 1... etc. In HTTP/1.1 this is not possible as the HTTP message is not wrapped into packets/frames tagged with an id as to which request this belongs to.

I’ve a diagram here and an animated version here that help conceptualise this better.

Barry Pollard
  • 40,655
  • 7
  • 76
  • 92
  • Then how about the json body payload. Is it also in binary? In addition, for http1.1 content-type:image/gif(octect-stream), is body also encoded as textual or binary? – Stan Apr 12 '22 at 12:39
  • 1
    Body’s are stored in a DATA frame which includes a header giving details like frame type, stream… etc. The payload within that can be text for text formats, but not for binary formats like GIF. However typically text resources are compressed with gzip or Brotli so will be binary data. – Barry Pollard Apr 12 '22 at 13:52
  • So basically text resources will be text which has to be translated to byte char by char. iintegers are still not supported in the text payload. Good thing is that we can use gzip(huffman encoding and LZ77) to shorten binary length – Stan Apr 13 '22 at 01:19
  • What is special about intermingling of frames than say - all frames of request1 going first then immediately all frames of request2 going down the same TCP connection in HTTP/2? To me, intermingling of frames just seems like reordering of different parts of different requests. Why do frames have to intermingle and why can't all the frames of req1 get sent together all in a sequential manner followed by all frames of req2? Why does gaps have to be present(how do they occur) between frames of a single request? – asn May 30 '23 at 21:32
  • Intermingling helps when request 1 isn't available (e.g. it's a request to a CDN and has to go all the way back to the origin). Still being able to send back the CSS and JS while that is blocked avoids waste. It's true for certain requests (e.g. CSS or JS) the full payload needs to be delivered so actual intermingling doesn't help much but for others (e.g. progressive JPEG or HTML), intermingling allows part of the resources to be sent first and the browser to start processing them. See: https://blog.cloudflare.com/better-http-2-prioritization-for-a-faster-web/ – Barry Pollard May 31 '23 at 04:05
4

HTTP basically encodes all relevant instructions as ASCII code points, e.g.:

GET /foo HTTP/1.1

Yes, this is represented as bytes on the actual transport layer, but the commands are based on ASCII bytes, and are hence readable as text.

HTTP/2 uses actual binary commands, i.e. individual bits and bytes which have no representation other than the bits and bytes that they are, and hence have no readable representation. (Note that HTTP/2 essentially wraps HTTP/1 in such a binary protocol, there's still "GET /foo" to be found somewhere in there.)

deceze
  • 510,633
  • 85
  • 743
  • 889
0

Conclusion

HTTP/2 is a binary protocol that uses a binary format for data transmission, in contrast to HTTP/1.x, which uses text formats.

Binary formats are more efficient because they don't require character set conversion and parsing like text formats.

Example

For instance, in HTTP/1, request header information is sent as text, which the receiver must parse into text format before using. However, in HTTP/2, request header information is read as binary format frames. For example:

00 00 0C                   ; Frame length: 12
01                         ; Frame type: HEADERS
04                         ; Flags: END_HEADERS
00 00 00 01                ; Stream Identifier: 1
82                         ; Compression flag
87 01 84 8D 4E 3D 6F C8    ; Binary data for request header information

In this example:

  1. The first byte 00 00 0C represents the frame length of 12 bytes.

  2. The second byte 01 represents the frame type as HEADERS.

  3. The third byte 04 represents the flag as END_HEADERS, indicating that this is the last frame for the request header information.

  4. The next four bytes 00 00 00 01 represent the stream identifier as 1, indicating that this is the first frame for the HTTP request.

  5. The fifth byte 82 represents the compression flag, indicating that the request header information is compressed.

  6. The final seven bytes 87 01 84 8D 4E 3D 6F C8 represent the binary data for the request header information.

Therefore, HTTP/2's binary protocol is more efficient and results in smaller data sizes.

zizhen zhan
  • 200
  • 3
  • 6
-3

I believe the primary reason HTTP/2 uses binary encoding is to pack the payload into the fixed sized frames. Plain text cannot fit exactly into the frame. So binary encoding the data and splitting into multiple frames would make lot more sense.

Prasanna
  • 2,390
  • 10
  • 11